cinder volume create failed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Bob Church |
Bug Description
Brief Description
-----------------
Creating a volume from ubuntu image, the volume status was changing from creating to downloading, but eventually in "error" status.
Severity
--------
Major
Steps to Reproduce
------------------
....
TC-name: networking/
Expected Behavior
------------------
status is "available"
Actual Behavior
----------------
status in "error"
Reproducibility
---------------
Reproducible
System Configuration
-------
One node system
Lab-name: SM-2
Branch/Pull Time/Commit
-------
stx master as of 20190506T233000Z
Last Pass
---------
20190505T233000Z
Timestamp/Logs
--------------
[2019-05-07 08:37:42,352] 262 DEBUG MainThread ssh.send :: Send 'cinder --os-username 'tenant2' --os-password 'Li69nux*' --os-project-name tenant2 --os-auth-url http://
[2019-05-07 08:37:44,323] 387 DEBUG MainThread ssh.expect :: Output:
+------
| Property | Value |
+------
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2019-05-
| description | None |
| encrypted | False |
| id | 449bba0d-
| metadata | {} |
| multiattach | False |
| name | vol-ubuntu_14-2 |
| os-vol-
| replication_status | None |
| size | 3 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
[2019-05-07 08:37:49,088] 262 DEBUG MainThread ssh.send :: Send 'cinder --os-username 'tenant2' --os-password 'Li69nux*' --os-project-name tenant2 --os-auth-url http://
[2019-05-07 08:37:50,507] 387 DEBUG MainThread ssh.expect :: Output:
+------
| Property | Value |
+------
| attached_servers | [] |
| attachment_ids | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2019-05-
| description | None |
| encrypted | False |
| id | 449bba0d-
| metadata | |
| multiattach | False |
| name | vol-ubuntu_14-2 |
| os-vol-
| replication_status | None |
| size | 3 |
| snapshot_id | None |
| source_volid | None |
| status | downloading |
[2019-05-07 08:37:55,191] 262 DEBUG MainThread ssh.send :: Send 'cinder --os-username 'tenant2' --os-password 'Li69nux*' --os-project-name tenant2 --os-auth-url http://
[2019-05-07 08:37:56,599] 387 DEBUG MainThread ssh.expect :: Output:
+------
| Property | Value |
+------
| attached_servers | [] |
| attachment_ids | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2019-05-
| description | None |
| encrypted | False |
| id | 449bba0d-
| metadata | |
| multiattach | False |
| name | vol-ubuntu_14-2 |
| os-vol-
| replication_status | None |
| size | 3 |
| snapshot_id | None |
| source_volid | None |
| status | error |
[wrsroot@
+------
| ID | Tenant ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+------
| 449bba0d-
| ab2305b8-
| fd4696b8-
+------
[wrsroot@
Test Activity
-------------
Sanity
tags: | added: stx.retestneeded |
Changed in starlingx: | |
assignee: | nobody → Daniel Badea (daniel.badea) |
Changed in starlingx: | |
importance: | Undecided → High |
status: | New → Triaged |
tags: | added: stx.sanity |
Changed in starlingx: | |
status: | Confirmed → Fix Committed |
Time frame when the error occured:
2019-05-07 08:37:49 ... 2019-05-07 08:37:55
Unfortunately there are no cinder-volume logs for the time frame when the error occured:
cinder- volume- f9d47fc7d- q85mf_openstack _cinder- volume- 98ccc7d2009c797 516e62ef407fcf3 3231d8ba53314bb caa0475ce125c95 978b.log 07T09:56: 28.869172348Z
starts at 2019-05-
cinder- volume- f9d47fc7d- q85mf_openstack _cinder- volume- b753345a608b14d c44623346840b96 9bde6a67e1b6b82 7200e3867ef28dc e715.log 07T10:26: 36.965982746Z
starts at 2019-05-
I assume cinder-volume has crashed, container was restarted and its logs were discarded.
Looking at sm-customer.log there's some mgr-restful-plugin activity:
| 2019-05- 07T08:44: 43.835 | 268 | service-scn | mgr-restful-plugin | enabled-active | disabled | audit state mismatch 07T08:44: 43.973 | 269 | service-scn | mgr-restful-plugin | disabled | enabling | enabled-active state requested 07T08:44: 44.144 | 270 | service-scn | mgr-restful-plugin | enabling | enabled-active | enable success
| 2019-05-
| 2019-05-
and also in ceph-mgr. controller- 0.log there's evidence ceph-mgr (providing REST API) was restarted:
2019-05-07 08:44:37.992 7fbe53950700 -1 received signal: Terminated from /usr/bin/python /etc/init. d/mgr-restful- plugin start (PID: 90056) UID: 0 02f1172fa8e561b 813eb564df5) mimic (stable), process ceph-mgr, pid 430346
2019-05-07 08:44:37.992 7fbe53950700 -1 mgr handle_signal *** Got signal Terminated ***
2019-05-07 08:44:46.319 7fa91d533380 0 ceph version 13.2.2 (67ecc2fbbca4f6
At the same time ceph-manager continuously reports Ceph is in HEALTH_WARN:
2019-05-07 08:44:21.343 90851 INFO ceph_manager. monitor [-] Current Ceph health: HEALTH_WARN detail: Degraded data redundancy: 1183/2835 objects degraded (41.728%), 83 pgs degraded, 264 pgs undersized
So what I think happened:
- controller-0 was locked/unlocked
- after reboot services are starting up and Ceph enters HEALTH_WARN and starts recovery procedure
- openstack services become available before Ceph is in HEALTH_OK
- cinder-volume tries to write downloaded volume but its request is suspended by ceph-mgr (Ceph usually hangs on to a request until it enters HEALTH_OK and then responds to it)
- but meanwhile mgr-restful-plugin detects ceph-mgr is slow to respond and restarts it
- when this happens cinder-wolume connection to ceph-mgr drops and volume write request fails