cinder-volume with ceph backend show 'total_capacity_gb': 0 if any of ceph-osd is down

Bug #1481785 reported by Dennis Dmitriev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Yuriy Nesenenko

Bug Description

If any of ceph-osd nodes was disconnected/shutdown, then cinder is unable to create volumes because 'cinder-volume' show that there is 0 free space on the storage.

System test 'ceph_ha_restart' is failing with the following error:

AssertionError: Failed 3 OSTF tests; should fail 1 tests. Names of failed tests: [{u'Check that required services are running (failure)': u'Some nova services have not been started.. Please refer to OpenStack logs for more details.'}, {u'Create volume and boot instance from it (failure)': u'Failed to get to expected status. In error state. Please refer to OpenStack logs for more details.'}, {u'Create volume and attach it to instance (failure)': u'Failed to get to expected status. In error state. Please refer to OpenStack logs for more details.'}]

Steps to reproduce:
            1. Create a cluster with ceph enabled for images and volumes;
            2. Add 3 nodes with controller and ceph OSD roles
            3. Add 1 node with ceph OSD roles
            4. Add 2 nodes with compute and ceph OSD roles
            5. Deploy the cluster
            6. Shutdown one node with ceph OSD role (for example, a compute node or node with only ceph OSD role)
            7. Wait 10 minutes
            8. Check /var/log/cinder/cinder-scheduler.log for phrase " Received volume service update from .... { ... , n'total_capacity_gb': N , ...} , or /var/log/cinder/cinder-volume.log for the same variable.

Expected result: In 'total_capacity_gb' : N, the N will be reduced by the size of ceph OSD that was shut down.

Actual result: In 'total_capacity_gb' : N, the N is zero, despite the fact that 'ceph status' show a lot of free space.

Try to create a volume from command line on controller node:
$ . openrc; cinder create 1
          , and then see the status of created volume:
$ . openrc; cinder list
          The created volume will be in 'error' status.

Example (controller nodes are node-1, node-4 and node-6):

* all nodes online: http://paste.openstack.org/show/408497/
* a compute node with ceph-osd is offline: http://paste.openstack.org/show/408498/

root@node-1:/var/log/cinder# cinder --version
1.1.1
root@node-1:/var/log/cinder# ceph --version
ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)

Tags: cinder
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

Fuel ISO version: {u'build_id': u'2015-07-29_21-24-26', u'build_number': u'108', u'auth_required': True, u'fuel-ostf_sha': u'f907eca277ab4ba769774417d6ff5bf30ef479ee', u'fuel-library_sha': u'90aff7558fb00373ccd363b7722e2f90dc25894d', u'nailgun_sha': u'999efffd19b823a27b17f0e97a42ac0d47ae9ce5', u'openstack_version': u'2015.1.0-7.0', u'fuel-nailgun-agent_sha': u'1512b9af6b41cc95c4d891c593aeebe0faca5a63', u'fuel-agent_sha': u'355c08a04917f047b88f66242767049d2b1d0ff0', u'api': u'1.0', u'python-fuelclient_sha': u'f04e6c46783ecd6000df31b61b6749da66d4d828', u'astute_sha': u'126709e7f18a719ec4bd2a13a37d972285381892', u'fuelmain_sha': u'de5b333815f8541224c6726dc8446ffc7fb18b5b', u'feature_groups': [u'mirantis'], u'release': u'7.0', u'release_versions': {u'2015.1.0-7.0': {u'VERSION': {u'build_id': u'2015-07-29_21-24-26', u'build_number': u'108', u'fuel-library_sha': u'90aff7558fb00373ccd363b7722e2f90dc25894d', u'nailgun_sha': u'999efffd19b823a27b17f0e97a42ac0d47ae9ce5', u'fuel-ostf_sha': u'f907eca277ab4ba769774417d6ff5bf30ef479ee', u'fuel-nailgun-agent_sha': u'1512b9af6b41cc95c4d891c593aeebe0faca5a63', u'fuel-agent_sha': u'355c08a04917f047b88f66242767049d2b1d0ff0', u'api': u'1.0', u'python-fuelclient_sha': u'f04e6c46783ecd6000df31b61b6749da66d4d828', u'astute_sha': u'126709e7f18a719ec4bd2a13a37d972285381892', u'fuelmain_sha': u'de5b333815f8541224c6726dc8446ffc7fb18b5b', u'feature_groups': [u'mirantis'], u'release': u'7.0', u'openstack_version': u'2015.1.0-7.0', u'production': u'docker'}}}, u'production': u'docker'}

description: updated
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

Wrong data is coming from here:
# /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py
...
            with RADOSClient(self) as client:
                ret, outbuf, _outs = client.cluster.mon_command(
                    '{"prefix":"df", "format":"json"}', '')
...

# Here is the 'outbuf' content for the cluster where all nodes in the cluster running:

{"stats":{"total_space":395312560,"total_used":12877064,"total_avail":382435496},
"pools":[
{"name":"data","id":0,"stats":{"kb_used":0,"bytes_used":0,"max_avail":172150433079,"objects":0}},
{"name":"metadata","id":1,"stats":{"kb_used":0,"bytes_used":0,"max_avail":172150433079,"objects":0}},
{"name":"rbd","id":2,"stats":{"kb_used":0,"bytes_used":0,"max_avail":172150433079,"objects":0}},
{"name":"images","id":3,"stats":{"kb_used":16691,"bytes_used":17090817,"max_avail":172150433079,"objects":8}},
{"name":"volumes","id":4,"stats":{"kb_used":1,"bytes_used":15,"max_avail":172150433079,"objects":3}},
{"name":"backups","id":5,"stats":{"kb_used":0,"bytes_used":0,"max_avail":172150433079,"objects":0}},
{"name":"compute","id":6,"stats":{"kb_used":0,"bytes_used":0,"max_avail":172150433079,"objects":0}}
]}

# And here is the 'outbuf' content for the cluster where compute node was shut down:

{"stats":{"total_space":355807504,"total_used":10739104,"total_avail":345068400},
"pools":[
{"name":"data","id":0,"stats":{"kb_used":0,"bytes_used":0,"max_avail":0,"objects":0}},
{"name":"metadata","id":1,"stats":{"kb_used":0,"bytes_used":0,"max_avail":0,"objects":0}},
{"name":"rbd","id":2,"stats":{"kb_used":0,"bytes_used":0,"max_avail":0,"objects":0}},
{"name":"images","id":3,"stats":{"kb_used":16691,"bytes_used":17090817,"max_avail":0,"objects":8}},
{"name":"volumes","id":4,"stats":{"kb_used":1,"bytes_used":15,"max_avail":0,"objects":3}},
{"name":"backups","id":5,"stats":{"kb_used":0,"bytes_used":0,"max_avail":0,"objects":0}},
{"name":"compute","id":6,"stats":{"kb_used":0,"bytes_used":0,"max_avail":0,"objects":0}}
]}

description: updated
summary: - cinder-scheduler with ceph backend show 'total_capacity_gb': 0 if any of
+ cinder-volume with ceph backend show 'total_capacity_gb': 0 if any of
ceph-osd is down
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

According to
"Wrong data is coming from here:
# /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py
...
            with RADOSClient(self) as client:
                ret, outbuf, _outs = client.cluster.mon_command(
                    '{"prefix":"df", "format":"json"}', '')"
this should be passed to either MOS ceph or MOS cinder teams

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → MOS Ceph (mos-ceph)
Changed in fuel:
status: New → Confirmed
Revision history for this message
Kostiantyn Danylov (kdanylov) wrote :

This bug in on cinder size. Ceph reports size correctly

Anton Arefiev (aarefiev)
Changed in fuel:
assignee: MOS Ceph (mos-ceph) → MOS Cinder (mos-cinder)
Anton Arefiev (aarefiev)
affects: fuel → mos
Changed in mos:
milestone: 7.0 → none
Anton Arefiev (aarefiev)
Changed in mos:
assignee: MOS Cinder (mos-cinder) → Kostiantyn Danylov (kdanylov)
Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

Please target this bug to milestone.

Changed in mos:
milestone: none → 7.0
Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

Moved to 7.0. If this is not the proper target, please update accordingly (set 'won't fix' for 7.0 and add appropriate target).

Ivan Kolodyazhny (e0ne)
Changed in mos:
assignee: Kostiantyn Danylov (kdanylov) → Yuriy Nesenenko (ynesenenko)
tags: added: cinder
Revision history for this message
Jon Proulx (jproulx) wrote :

This *IS* actually a ceph bug https://bugzilla.redhat.com/show_bug.cgi?id=1225081

Description of problem:
Ceph df normally reports the MAX AVAIL space considering the OSDs in the ruleset, but when on of the OSDs is down and out it just reports 0 instead of the real MAX AVAIL space for the pools using that ruleset.

fixed in ceph 0.80.10 and newer

Revision history for this message
Stephen Jahl (sjahl) wrote :

Hi, I came across this bug report while I was researching the same issue on my (non-mirantis) openstack/ceph setup. It actually ended up being a ceph bug which was fixed in this pull request: https://github.com/ceph/ceph/pull/3826

That merge was included in the ceph .80.10 release, and simply updating to that fixed the issue for me,

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/cinder (openstack-ci/fuel-7.0/2015.1.0)

Reviewed: https://review.fuel-infra.org/10426
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-7.0/2015.1.0

Commit: 337bfceeeef5e9acbc489fa66163666547d46f25
Author: Yuriy Nesenenko <email address hidden>
Date: Wed Aug 19 13:33:19 2015

Fix in Cinder size with ceph backend if any of ceph-osd is down

If any of ceph-osd nodes was disconnected/shutdown, then Cinder is
unable to create volumes because 'cinder-volume' show that there
is 0 free space on the storage, despite the fact that 'ceph status'
show a lot of free space.

Closes-Bug: #1481785
Change-Id: I30bf70667374c2d4b51ac29cac4966298c7dd7fe

Changed in mos:
status: Confirmed → Fix Committed
Revision history for this message
Oleksiy Butenko (obutenko) wrote :

on verification

Revision history for this message
Oleksiy Butenko (obutenko) wrote :

verified on MOS 7.0 ISO 265
{"build_id": "265", "build_number": "265", "release_versions": {"2015.1.0-7.0": {"VERSION": {"build_id": "265", "build_number": "265", "api": "1.0", "fuel-library_sha": "4fdf3d6b070204366593012428395d173698678a", "nailgun_sha": "0dfcf73deb8ae99654f3da2ea95b7b68b9ee7273", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "d7027952870a35db8dc52f185bb1158cdd3d1ebd", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "082a47bf014002e515001be05f99040437281a2d", "production": "docker", "python-fuelclient_sha": "9643fa07f1290071511066804f962f62fe27b512", "astute_sha": "e63709d16bd4c1949bef820ac336c9393c040d25", "fuel-ostf_sha": "582a81ccaa1e439a3aec4b8b8f6994735de840f4", "release": "7.0", "fuelmain_sha": "9ab01caf960013dc882825dc9b0e11ccf0b81cb0"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "4fdf3d6b070204366593012428395d173698678a", "nailgun_sha": "0dfcf73deb8ae99654f3da2ea95b7b68b9ee7273", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "d7027952870a35db8dc52f185bb1158cdd3d1ebd", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "082a47bf014002e515001be05f99040437281a2d", "production": "docker", "python-fuelclient_sha": "9643fa07f1290071511066804f962f62fe27b512", "astute_sha": "e63709d16bd4c1949bef820ac336c9393c040d25", "fuel-ostf_sha": "582a81ccaa1e439a3aec4b8b8f6994735de840f4", "release": "7.0", "fuelmain_sha": "9ab01caf960013dc882825dc9b0e11ccf0b81cb0"}

Changed in mos:
status: Fix Committed → Fix Released
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/cinder (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Yuriy Nesenenko <email address hidden>
Review: https://review.fuel-infra.org/13328

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/cinder (openstack-ci/fuel-8.0/liberty)

Reviewed: https://review.fuel-infra.org/13328
Submitter: Pkgs Jenkins <email address hidden>
Branch: openstack-ci/fuel-8.0/liberty

Commit: b7ed0f16f936fc0cef6c4823c12b589ef8ebc559
Author: Yuriy Nesenenko <email address hidden>
Date: Fri Nov 13 13:01:14 2015

Fix in Cinder size with ceph backend if any of ceph-osd is down

If any of ceph-osd nodes was disconnected/shutdown, then Cinder is
unable to create volumes because 'cinder-volume' show that there
is 0 free space on the storage, despite the fact that 'ceph status'
show a lot of free space.

Closes-Bug: #1481785

Conflicts:

 cinder/volume/drivers/rbd.py

Change-Id: I30bf70667374c2d4b51ac29cac4966298c7dd7fe

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/cinder (9.0/mitaka)

Fix proposed to branch: 9.0/mitaka
Change author: Yuriy Nesenenko <email address hidden>
Review: https://review.fuel-infra.org/18514

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/cinder (9.0/mitaka)

Reviewed: https://review.fuel-infra.org/18514
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0/mitaka

Commit: 04c6762ff7a955bc3374f6c6ed7456c3f295456e
Author: Yuriy Nesenenko <email address hidden>
Date: Fri Apr 1 10:18:05 2016

Fix in Cinder size with ceph backend if any of ceph-osd is down

If any of ceph-osd nodes was disconnected/shutdown, then Cinder is
unable to create volumes because 'cinder-volume' show that there
is 0 free space on the storage, despite the fact that 'ceph status'
show a lot of free space.

Closes-Bug: #1481785
Change-Id: I30bf70667374c2d4b51ac29cac4966298c7dd7fe

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/cinder (mcp/newton)

Fix proposed to branch: mcp/newton
Change author: Yuriy Nesenenko <email address hidden>
Review: https://review.fuel-infra.org/33469

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/cinder (11.0/ocata)

Fix proposed to branch: 11.0/ocata
Change author: Yuriy Nesenenko <email address hidden>
Review: https://review.fuel-infra.org/34187

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/cinder (mcp/ocata)

Fix proposed to branch: mcp/ocata
Change author: Yuriy Nesenenko <email address hidden>
Review: https://review.fuel-infra.org/34850

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/cinder (11.0/ocata)

Change abandoned by Roman Podoliaka <email address hidden> on branch: 11.0/ocata
Review: https://review.fuel-infra.org/34187
Reason: we do not need 11.0/ocata anymore - use mcp/ocata instead

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/cinder (mcp/ocata)

Reviewed: https://review.fuel-infra.org/34850
Submitter: Pkgs Jenkins <email address hidden>
Branch: mcp/ocata

Commit: e6610ee74b5e9169b8fdbd566ad903d295b1fafa
Author: Yuriy Nesenenko <email address hidden>
Date: Wed Apr 26 13:11:52 2017

Fix in Cinder size with ceph backend if any of ceph-osd is down

If any of ceph-osd nodes was disconnected/shutdown, then Cinder is
unable to create volumes because 'cinder-volume' show that there
is 0 free space on the storage, despite the fact that 'ceph status'
show a lot of free space.

Closes-Bug: #1481785
Change-Id: I30bf70667374c2d4b51ac29cac4966298c7dd7fe

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/cinder (mcp/newton)

Reviewed: https://review.fuel-infra.org/33469
Submitter: Pkgs Jenkins <email address hidden>
Branch: mcp/newton

Commit: c576ea30ab0b3d5f16003fa708e7541074e7a7d3
Author: Yuriy Nesenenko <email address hidden>
Date: Thu May 4 12:46:53 2017

Fix in Cinder size with ceph backend if any of ceph-osd is down

If any of ceph-osd nodes was disconnected/shutdown, then Cinder is
unable to create volumes because 'cinder-volume' show that there
is 0 free space on the storage, despite the fact that 'ceph status'
show a lot of free space.

Closes-Bug: #1481785
Change-Id: I30bf70667374c2d4b51ac29cac4966298c7dd7fe
(cherry picked from commit e6610ee74b5e9169b8fdbd566ad903d295b1fafa)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.