nova has no disk space when overcloud is deployed with ceph-ansible

Bug #1717251 reported by Jan Provaznik
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Jan Provaznik

Bug Description

If I deploy an overcloud with the following command:
THT=/home/stack/tht
openstack overcloud deploy \
    --templates $THT \
    --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --block-storage-flavor oooq_blockstorage --swift-storage-flavor oooq_objec
tstorage --timeout 90 \
    -e $THT/environments/docker.yaml \
    -e $THT/environments/docker-ha.yaml \
    -e ~/docker_registry.yaml \
    -e $THT/environments/ceph-ansible/ceph-ansible.yaml \
    -e /home/stack/cloud-names.yaml \
    -e /home/stack/network-environment.yaml \
    -e $THT/environments/low-memory-usage.yaml \
    -e /home/stack/inject-trust-anchor.yaml \
    -e $THT/environments/disable-telemetry.yaml \
    -e $THT/environments/enable-swap.yaml \
    --ceph-storage-scale 1 --control-scale 1 \
    --ntp-server pool.ntp.org \
    ${DEPLOY_ENV_YAML:+-e $DEPLOY_ENV_YAML} "$@" && status_code=0 || status_code=$?

Then nova hypervisor stats shows zero local disk space:
(overcloud) [stack@undercloud ~]$ source overcloudrc
(overcloud) [stack@undercloud ~]$ nova hypervisor-stats
+----------------------+-------+
| Property | Value |
+----------------------+-------+
| count | 1 |
| current_workload | 0 |
| disk_available_least | 0 |
| free_disk_gb | 0 |
| free_ram_mb | 4095 |
| local_gb | 0 |
| local_gb_used | 0 |
| memory_mb | 8191 |
| memory_mb_used | 4096 |
| running_vms | 0 |
| vcpus | 2 |
| vcpus_used | 0 |
+----------------------+-------+

but nova compute node is up:
(overcloud) [stack@undercloud ~]$ nova service-list
+--------------------------------------+------------------+-------------------------------------+----------+---------+-------+----------------------------+-----------------+-------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | Forced down |
+--------------------------------------+------------------+-------------------------------------+----------+---------+-------+----------------------------+-----------------+-------------+
| 8d2832fb-331e-4c97-b9c7-5ee6ba427e4a | nova-scheduler | overcloud-controller-0.localdomain | internal | enabled | up | 2017-09-14T12:37:55.000000 | - | False |
| ee6904c7-1a51-4082-8edc-296b1ef0d4ec | nova-consoleauth | overcloud-controller-0.localdomain | internal | enabled | up | 2017-09-14T12:37:55.000000 | - | False |
| 92f509d0-0117-4e6c-8295-ccb7b9afaae1 | nova-conductor | overcloud-controller-0.localdomain | internal | enabled | up | 2017-09-14T12:37:51.000000 | - | False |
| 9f4f7a12-8c5d-4b29-842b-0fbe40c850bf | nova-compute | overcloud-novacompute-0.localdomain | nova | enabled | up | 2017-09-14T12:37:54.000000 | - | False |
+--------------------------------------+------------------+-------------------------------------+----------+---------+-------+----------------------------+-----------------+-------------+

If I use the exactly same command for deploying overcloud but without "-e $THT/environments/ceph-ansible/ceph-ansible.yaml" then I can see non-zero local_gb/free_disk_gb values.

It looks like nova-compute fails to connect to the ceph cluster when checking available pool size, but there is no error in /var/log/containers/nova/nova-compute.log other than:
2017-09-14 11:11:11.888 5 WARNING nova.scheduler.client.report [req-70c64ea4-e2c3-4cb7-98c1-722b51533fcc - - - - -] [req-92ef5167-995b-4cc2-b9e3-dba18b359d64] Failed to update inventory for
resource provider 1141f7e7-0c86-40a3-b1af-c2bb21e1a16d: 400 {"errors": [{"status": 400, "request_id": "req-92ef5167-995b-4cc2-b9e3-dba18b359d64", "detail": "The server could not comply with
the request since it is either malformed or otherwise incorrect.\n\n JSON does not validate: 0.0 is less than the minimum of 1 Failed validating 'minimum' in schema['properties']['inventori
es']['patternProperties']['^[A-Z0-9_]+$']['properties']['max_unit']: {'maximum': 2147483647, 'minimum': 1, 'type': 'integer'} On instance['inventories'][u'DISK_GB']['max_unit']: 0
", "title": "Bad Request"}]}

Changed in tripleo:
milestone: none → queens-1
Changed in tripleo:
status: New → Triaged
importance: Undecided → High
assignee: nobody → John Fulton (jfulton-org)
Changed in tripleo:
importance: High → Medium
Revision history for this message
Jan Provaznik (jan-provaznik) wrote :

Giulio pointed out (thanks) that it's because OSD has only ~500MB free space and it returns size in GBs so it rounds it to 0. We may need to increase extradisks size in tripleo-quickstart to make sure that default deployments pass (off-topic: sice increase may be necessary for Luminous anyway - https://review.openstack.org/#/c/505122/)

Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Revision history for this message
John Fulton (jfulton-org) wrote :

Jan,

I hope you don't mind I assigned this bug to you. As per your comment and because your patch, https://review.openstack.org/#/c/505122, has merged, can we close this?

  John

Changed in tripleo:
assignee: John Fulton (jfulton-org) → nobody
assignee: nobody → Jan Provaznik (jan-provaznik)
Changed in tripleo:
status: Triaged → Fix Committed
Changed in tripleo:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.