New backup volume sometimes stuck in 'creating' state

Bug #1877076 reported by Aurelien Lourot
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Cinder-backup Charm
Triaged
Medium
Unassigned

Bug Description

Hitting us here:

https://review.opendev.org/#/c/712671/
https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline_func_full/openstack/charm-cinder-backup/712671/2/5413/consoleText.test_charm_func_full_8585.txt

2020-04-21 10:33:32 [INFO] File "/tmp/tmp.BlcV5RMGfx/func/lib/python3.5/site-packages/zaza/openstack/utilities/openstack.py", line 1808, in _resource_reaches_status
2020-04-21 10:33:32 [INFO] expected_status,))
2020-04-21 10:33:32 [INFO] AssertionError: Resource in creating state, waiting for available

openstack_utils.resource_reaches_status(
    self.cinder_client.backups,
    vol_backup.id,
    wait_iteration_max_time=180,
    stop_after_attempt=15,
    expected_status='available',
    msg='Volume status wait')

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

It looks like it sometimes take more than 180 seconds for the backup to change from 'creating' to 'available'.

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Actually when this happens, the backup remains in 'creating' state for 22 minutes until we timed out. We suspect this is an infrastructure issue on our CI system. Last time this happened [0] the hypervisor running cinder-backup was 'koch'. Older occurrences have been log-rotated away, so let's see next time if it's still the same hypervisor.

0: https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline_func_full/openstack/charm-cinder-backup/712671/4/5637/index.html

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Can't reproduce anymore after several attempts. Closing for now.

Changed in charm-cinder-backup:
status: In Progress → Invalid
assignee: Aurelien Lourot (aurelien-lourot) → nobody
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Happened again here [0]. This time the hypervisor running cinder-backup was 'flemming', not 'koch', so I think this invalidates our theory of an infrastructure issue. Re-opening and attaching crashdump.

0: https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline_func_full/openstack/charm-cinder-backup/739929/4/6639/index.html

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Also this time it was trusty/mitaka instead of xenial/mitaka

summary: test_410_cinder_vol_create_backup_delete_restore_pool_inspect failing on
- xenial/mitaka
+ mitaka
Changed in charm-cinder-backup:
status: Invalid → New
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote : Re: test_410_cinder_vol_create_backup_delete_restore_pool_inspect failing on mitaka
Changed in charm-cinder-backup:
status: New → In Progress
assignee: nobody → Aurelien Lourot (aurelien-lourot)
tags: added: unstable-test
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Now taking at least trusty-mitaka out of the gate: https://review.opendev.org/#/c/748623/

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :
summary: - test_410_cinder_vol_create_backup_delete_restore_pool_inspect failing on
- mitaka
+ test_410_cinder_vol_create_backup_delete_restore_pool_inspect failing
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote : Re: test_410_cinder_vol_create_backup_delete_restore_pool_inspect failing

Is this just a timeout issue? If so, let's increase the timeout and see if it goes away?

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

no because the timeout is already 22 minutes, see comment #22

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

I meant comment #2

Revision history for this message
Nikolay Vinogradov (nikolay.vinogradov) wrote :

Also seeing this on the customer deployment (bionic/ussuri). 'opentstack volume backup create' of is stuck in 'creating' state (using the current revision of cinder-backup charm):

$ openstack volume backup show 02e77e70-c502-4de9-8487-7aa1b42e4a98
+-----------------------+--------------------------------------+
| Field | Value |
+-----------------------+--------------------------------------+
| availability_zone | None |
| container | None |
| created_at | 2020-10-08T17:10:11.000000 |
| data_timestamp | 2020-10-08T17:10:11.000000 |
| description | None |
| fail_reason | None |
| has_dependent_backups | False |
| id | 02e77e70-c502-4de9-8487-7aa1b42e4a98 |
| is_incremental | False |
| name | test-volume-2-backup |
| object_count | 0 |
| size | 6 |
| snapshot_id | None |
| status | creating |
| updated_at | 2020-10-08T17:10:11.000000 |
| volume_id | 96f2d720-e6cb-46af-8b64-cf8d69aafcef |
+-----------------------+--------------------------------------+
ubuntu@infra1:~/deployment/cpe-deployments$ date
Thu Oct 8 23:03:32 UTC 2020

It's definitely not the timeout.

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

This seems to be an upstream issue. FYI here a workaround in our tests to unblock our gate: https://github.com/openstack-charmers/zaza-openstack-tests/pull/434

Changed in charm-cinder-backup:
status: In Progress → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-cinder-backup (master)

Reviewed: https://review.opendev.org/748623
Committed: https://git.openstack.org/cgit/openstack/charm-cinder-backup/commit/?id=74dc1566601c56ba696de4c7a49dd83f76a150ed
Submitter: Zuul
Branch: master

commit 74dc1566601c56ba696de4c7a49dd83f76a150ed
Author: Aurelien Lourot <email address hidden>
Date: Fri Aug 28 14:05:05 2020 +0200

    Add Victoria to the test gate

    Also sync libraries.
    Also take trusty-mitaka out of the gate because of linked bug.
    Also fixed Victoria bundles as they were trying to deploy
    percona-cluster.

    Func-Test-Pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/434
    Change-Id: I575d00b993fbff33d80956278b01e87e434713e0
    Related-Bug: #1877076

summary: - test_410_cinder_vol_create_backup_delete_restore_pool_inspect failing
+ New backup volume sometimes stuck in 'creating' state
Changed in charm-cinder-backup:
importance: High → Medium
assignee: Aurelien Lourot (aurelien-lourot) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.