ceph-rbd-mirror function tests fail on train
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph Monitor Charm |
Fix Released
|
High
|
Liam Young | ||
Ceph RBD Mirror Charm |
Invalid
|
Undecided
|
Liam Young | ||
OpenStack Charm Test Infra |
Invalid
|
Undecided
|
Unassigned |
Bug Description
charm-ceph-
In the train bundle we had to switch to juju storage for ceph-osds which I think has something to do with the failure. Switching bionic-stein to juju storage also results in the same initial timeout issues with test_cinder_
The only bundle difference between bionic-stein.yaml and bionic-train.yaml, other than source/
# bionic-stein
ceph-osd:
charm: cs:~openstack-
num_units: 3
options:
source: cloud:bionic-stein
bluestore: False
use-
osd-devices: /opt
# bionic-train
ceph-osd:
charm: cs:~openstack-
num_units: 3
storage:
osd-devices: 'cinder,10G'
options:
source: cloud:bionic-train
bluestore: False
use-
osd-devices: '/dev/test-
The first failure I came across was timeouts with test_cinder_
--- a/zaza/
+++ b/zaza/
@@ -210,8 +210,13 @@ class CephRBDMirrorTe
volume = cinder.
try:
+ # Note(coreycb): stop_after_attempt is increased because using
+ # juju storage for ceph-osd backed by cinder on undercloud
+ # takes longer than the prior method of directory-backed OSD
+ # devices.
- cinder.volumes, volume.id, msg='volume')
+ cinder.volumes, volume.id, msg='volume',
+ stop_after_
except AssertionError:
That allows the volume to become available after a long time:
(func-smoke) ubuntu@
2019-11-08 15:07:15 [INFO] ## Running Test zaza.openstack.
2019-11-08 15:07:24 [INFO] Using keystone API V3 (or later) for overcloud auth
2019-11-08 15:07:26 [INFO] test_cinder_
2019-11-08 15:07:26 [INFO] Validate that a volume created through Cinder is mirrored.
2019-11-08 15:07:26 [INFO] ...
2019-11-08 15:07:30 [INFO] Using keystone API V3 (or later) for overcloud auth
2019-11-08 15:07:31 [WARNING] Version 2 is deprecated, use alternative version 3 instead.
2019-11-08 15:07:34 [INFO] creating
2019-11-08 15:07:35 [INFO] creating
2019-11-08 15:07:37 [INFO] creating
2019-11-08 15:07:42 [INFO] downloading
2019-11-08 15:07:50 [INFO] downloading
2019-11-08 15:08:06 [INFO] downloading
2019-11-08 15:08:38 [INFO] downloading
2019-11-08 15:09:38 [INFO] downloading
2019-11-08 15:10:39 [INFO] downloading
2019-11-08 15:11:39 [INFO] downloading
2019-11-08 15:12:39 [INFO] downloading
2019-11-08 15:13:40 [INFO] downloading
2019-11-08 15:14:40 [INFO] downloading
2019-11-08 15:15:40 [INFO] downloading
2019-11-08 15:16:40 [INFO] downloading
2019-11-08 15:17:41 [INFO] downloading
2019-11-08 15:18:41 [INFO] available
But the tests hang after that. At some point during that test the workload status switched to a Pools WARNING for ceph-rbd-
Unit Workload Agent Machine Public address Ports Message
ceph-mon-b/0 active idle 3 10.5.0.48 Unit is ready and clustered
ceph-mon-b/1* active idle 4 10.5.0.9 Unit is ready and clustered
ceph-mon-b/2 active idle 5 10.5.0.36 Unit is ready and clustered
ceph-mon/0* active idle 0 10.5.0.8 Unit is ready and clustered
ceph-mon/1 active idle 1 10.5.0.45 Unit is ready and clustered
ceph-mon/2 active idle 2 10.5.0.17 Unit is ready and clustered
ceph-osd-b/0* active idle 9 10.5.0.10 Unit is ready (1 OSD)
ceph-osd-b/1 active idle 10 10.5.0.13 Unit is ready (1 OSD)
ceph-osd-b/2 active idle 11 10.5.0.21 Unit is ready (1 OSD)
ceph-osd/0 active idle 6 10.5.0.47 Unit is ready (1 OSD)
ceph-osd/1 active idle 7 10.5.0.52 Unit is ready (1 OSD)
ceph-osd/2* active idle 8 10.5.0.14 Unit is ready (1 OSD)
ceph-rbd-
ceph-rbd-mirror/0* active idle 12 10.5.0.26 Unit is ready (Pools OK (2) Images Primary (2))
cinder/0* active idle 14 10.5.0.40 8776/tcp Unit is ready
cinder-ceph/0* active idle 10.5.0.40 Unit is ready
glance/0* active idle 15 10.5.0.38 9292/tcp Unit is ready
keystone/0* active idle 16 10.5.0.30 5000/tcp Unit is ready
mysql/0* active idle 17 10.5.0.27 3306/tcp Unit is ready
rabbitmq-server/0* active idle 18 10.5.0.39 5672/tcp Unit is ready
Looking deeper at rbd mirror pool commands:
ubuntu@
Mode: pool
Peers:
UUID NAME CLIENT
7e39c18b-
ubuntu@
health: WARNING
images: 1 total
1 unknown
volume-
global_id: ad3dd728-
state: down+replaying
description: replaying, master_
last_update: 2019-11-08 15:03:28
On ceph-rbd-mirror-b/0 I'm seeing 'image no longer exists' in ceph log:
2019-11-07 21:24:11.640 7f5d7f316700 0 rbd::mirror:
2019-11-07 21:24:27.664 7f5d7f316700 0 rbd::mirror:
Also on ceph-rbd-mirror-b/0 seeing a lot of timeout messages like this in ceph log:
2019-11-08 15:10:08.110 7f5da7fff700 -1 rbd::mirror:
On ceph-rbd-mirror/0 I'm also seeing 'image no longer exists' in ceph log:
2019-11-07 21:07:53.949 7f3233a6a700 0 rbd::mirror:
2019-11-07 21:24:38.562 7f320b316700 0 rbd::mirror:
Restarting services 'sudo systemctl restart ceph-rbd-mirror*' gets things looking better after a little while:
Unit Workload Agent Machine Public address Ports Message
ceph-rbd-
ceph-rbd-mirror/0* active idle 12 10.5.0.26 Unit is ready (Pools OK (2) Images Primary (4))
And rbd mirror pool status looks good:
ubuntu@
health: OK
images: 3 total
3 replaying
rbd: failed to get service dump: (13) Permission denied
volume-
global_id: ad3dd728-
state: up+replaying
description: replaying, master_
service:
last_update: 2019-11-08 17:31:11
volume-
global_id: ac2b3be5-
state: up+replaying
description: replaying, master_
service:
last_update: 2019-11-08 17:31:14
volume-
global_id: 903617e5-
state: up+replaying
description: replaying, master_
service:
last_update: 2019-11-08 17:31:10
But re-running the test gets it back to "Pools WARNING" state.
summary: |
- ceph-rbd-mirror function tests fail with ceph-osd using juju storage + ceph-rbd-mirror function tests fail with ceph-osd juju storage |
Changed in charm-ceph-mon: | |
assignee: | nobody → Liam Young (gnuoy) |
Changed in charm-ceph-rbd-mirror: | |
status: | New → Invalid |
Changed in charm-test-infra: | |
status: | New → Invalid |
Changed in charm-ceph-mon: | |
importance: | Undecided → High |
milestone: | none → 20.05 |
status: | New → Fix Committed |
Changed in charm-ceph-mon: | |
status: | Fix Committed → Fix Released |
I tried bionic-stein with juju storage and while do appear to run fine they take a very long time. Anyway I think the juju storage issue might be separate from the rbd-mirror issue on train.