incorrect replica count in a single-unit ceph deployment

Bug #1565120 reported by Jason Hobbs
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ceph (Juju Charms Collection)
Fix Released
Critical
Chris Holcombe
ceph-mon (Juju Charms Collection)
Fix Released
Critical
Chris Holcombe
glance (Juju Charms Collection)
Invalid
Undecided
Unassigned

Bug Description

With ceph deployed with a single unit (no quorum), ceph-backed cinder and ceph-backed glance functionality is broken. This causes glance image uploads to fail, and likely volume creation.

Although the documented and recommended solution for ceph is to have at least 3 units, in certain test scenarios, such as where expensive storage hardware is being exercised, only one unit is deployed for those tests.

OIL is consistently failing to upload images to glance when testing with the -next charms:

We're getting an error with this traceback:
https://pastebin.canonical.com/153259/

Here is the deployment yaml:
https://pastebin.canonical.com/153262/

Here is juju status output:
https://pastebin.canonical.com/153260/

I've attached logs from a glance unit where this failed.

Tags: oil

Related branches

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Can you please try forcing api version 1 to see if that resolves?:

ex:

glance --os-image-api-version 1 image-create --name="trusty" --is-public=true --progress --container-format=bare --disk-format=qcow2 < ~/images/trusty-server-cloudimg-amd64-disk1.img

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Hi Ryan, we're already forcing version 1:

glanceclient.Client('1', endpoint=endpoint, token=token)

Any other ideas?

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I've been able to reproduce this in manual testing and I think it's ceph related.

What happens is I start an image upload and then it stalls after about 30 seconds and disconnects:

ubuntu@doberman-dev:~/debug-next$ glance image-create --name="trusty-test5" --progress --container-format=bare --disk-format=qcow2 < trusty-server-cloudimg-amd64-disk1.img
[=> ] 2%Error finding address for http://10.245.44.192:9292/v1/images: [Errno 32] Broken pipe

I've tried this many times, ceph logs end up with many warnings about slow requests, some nearly a day old now:

2016-04-13 20:07:16.040687 osd.0 10.245.0.168:6800/51477 282 : cluster [WRN] 13 slow requests, 1 included below; oldest blocked for > 78521.766937 secs
2016-04-13 20:07:16.040692 osd.0 10.245.0.168:6800/51477 283 : cluster [WRN] slow request 480.288454 seconds old, received at 2016-04-13 19:59:15.752203: osd_op(client.15448.0:1 9b4a2eff-3f7b-47ca-97df-2c7bebe79fc3.rbd [stat] 2.55bb04cb ack+read+known_if_redirected e9) currently reached_pg

I'm only running one ceph unit, which has been fine in the past. I wonder if something changed in the ceph charms to make this no longer ok?

Revision history for this message
Ryan Beisner (1chb1n) wrote :

We've not encountered a similar issue in the next charm test automation, but we only test ceph with the 3 units as recommended.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Here is the deployment yaml from the deployment with the ceph logs attached above:

http://paste.ubuntu.com/15820049/

Changed in ceph (Juju Charms Collection):
assignee: nobody → Chris Holcombe (xfactor973)
Ryan Beisner (1chb1n)
description: updated
Changed in glance (Juju Charms Collection):
status: New → Invalid
Changed in ceph (Juju Charms Collection):
importance: Undecided → Critical
status: New → Confirmed
status: Confirmed → In Progress
description: updated
summary: - Failure uploading images with next charms
+ incorrect replica count in a single-unit ceph deployment
Ryan Beisner (1chb1n)
Changed in ceph (Juju Charms Collection):
milestone: none → 16.04
Changed in ceph-mon (Juju Charms Collection):
assignee: nobody → Chris Holcombe (xfactor973)
status: New → In Progress
importance: Undecided → Critical
milestone: none → 16.04
Ryan Beisner (1chb1n)
Changed in ceph (Juju Charms Collection):
status: In Progress → Fix Committed
Changed in ceph-mon (Juju Charms Collection):
status: In Progress → Fix Committed
James Page (james-page)
Changed in ceph (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in ceph-mon (Juju Charms Collection):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.