gate-tempest-dsvm-full-devstack-plugin-ceph-ubuntu-xenial fails randomly with NoValidHost - no capacity

Bug #1645530 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Invalid
Undecided
Unassigned
devstack-plugin-ceph
Fix Released
Undecided
Matt Riedemann

Bug Description

Based on this test:

http://logs.openstack.org/25/403925/1/check/gate-tempest-dsvm-full-devstack-plugin-ceph-ubuntu-xenial/58cbafc/console.html#_2016-11-28_23_04_57_200125

I find the volume goes to ERROR state because of a NoValidHost in the cinder scheduler here:

http://logs.openstack.org/25/403925/1/check/gate-tempest-dsvm-full-devstack-plugin-ceph-ubuntu-xenial/58cbafc/logs/screen-c-sch.txt.gz#_2016-11-28_22_46_08_438

2016-11-28 22:46:08.438 DEBUG cinder.scheduler.base_filter [req-42b3f314-f49a-49dd-bfac-ca56eb249283 tempest-VolumesListAdminV2TestJSON-290067758] Filter CapabilitiesFilter returned 0 host(s) get_filtered_objects /opt/stack/new/cinder/cinder/scheduler/base_filter.py:132
2016-11-28 22:46:08.438 DEBUG cinder.scheduler.base_filter [req-42b3f314-f49a-49dd-bfac-ca56eb249283 tempest-VolumesListAdminV2TestJSON-290067758] Filtering removed all hosts for the request with volume ID '755b941b-2211-4a47-b252-d67b7138e0f4'. Filter results: [('AvailabilityZoneFilter', [u'ubuntu-xenial-osic-cloud1-s3700-5847744@ceph#ceph']), ('CapacityFilter', []), ('CapabilitiesFilter', [])] _log_filtration /opt/stack/new/cinder/cinder/scheduler/base_filter.py:86
2016-11-28 22:46:08.438 INFO cinder.scheduler.base_filter [req-42b3f314-f49a-49dd-bfac-ca56eb249283 tempest-VolumesListAdminV2TestJSON-290067758] Filtering removed all hosts for the request with volume ID '755b941b-2211-4a47-b252-d67b7138e0f4'. Filter results: AvailabilityZoneFilter: (start: 1, end: 1), CapacityFilter: (start: 1, end: 0), CapabilitiesFilter: (start: 0, end: 0)
2016-11-28 22:46:08.439 WARNING cinder.scheduler.filter_scheduler [req-42b3f314-f49a-49dd-bfac-ca56eb249283 tempest-VolumesListAdminV2TestJSON-290067758] No weighed hosts found for volume with properties: {'name': u'ceph', 'qos_specs_id': None, 'deleted': False, 'created_at': '2016-11-28T22:27:52.000000', 'updated_at': None, 'extra_specs': {u'volume_backend_name': u'ceph'}, 'is_public': True, 'deleted_at': None, 'id': '1b56055d-2d73-4d54-a3cc-e9238c1095ea', 'projects': [], 'description': None}

Looks like the capacity and capabilities filters failed.

This also looks like a regression on the master branch (ocata) as of 11/26:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22No%20weighed%20hosts%20found%20for%20volume%20with%20properties%5C%22%20AND%20message%3A%5C%22%7Bu'volume_backend_name'%3A%20u'ceph'%7D%5C%22%20AND%20tags%3A%5C%22screen-c-sch.txt%5C%22&from=7d

Revision history for this message
Matt Riedemann (mriedem) wrote :

I'm not seeing any obvious changes in devstack, devstack-plugin-ceph or cinder repos since 11/25 or 11/26, so it might be something that changed in Tempest.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Right before we fail, we can see there is <1gb of capacity on the host here:

http://logs.openstack.org/25/403925/1/check/gate-tempest-dsvm-full-devstack-plugin-ceph-ubuntu-xenial/58cbafc/logs/screen-c-sch.txt.gz#_2016-11-28_22_45_57_249

2016-11-28 22:45:57.249 DEBUG cinder.scheduler.filter_scheduler [req-682ea816-7fdb-4e22-89a8-49419be16b11 tempest-VolumesListAdminV2TestJSON-1686276434] Filtered [host 'ubuntu-xenial-osic-cloud1-s3700-5847744@ceph#ceph': free_capacity_gb: 0.76, pools: None] _get_weighted_candidates /opt/stack/new/cinder/cinder/scheduler/filter_scheduler.py:333

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looks like we have at most 8GB of free capacity on the host at any given time, which seems odd, I'd think we'd have bigger disks on these CI hosts. That wouldn't even match quota for a single project:

#quota_volumes = 10

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

2016-11-28 22:33:01.465 | + lib/lvm:init_lvm_volume_group:119 : _create_lvm_volume_group stack-volumes-lvmdriver-1 24G

Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in cinder:
status: New → Invalid
Revision history for this message
Matt Riedemann (mriedem) wrote :

devstack defaults to 10GB, but devstack-gate defaults to 24GB if running tempest:

http://git.openstack.org/cgit/openstack-infra/devstack-gate/tree/devstack-vm-gate.sh#n475

    if [[ "$DEVSTACK_GATE_TEMPEST" -eq "1" ]]; then
        # Volume tests in Tempest require a number of volumes
        # to be created, each of 1G size. Devstack's default
        # volume backing file size is 10G.
        #
        # The 24G setting is expected to be enough even
        # in parallel run.
        echo "VOLUME_BACKING_FILE_SIZE=24G" >> "$localrc_file"

Changed in devstack-plugin-ceph:
status: New → In Progress
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack-plugin-ceph (master)

Reviewed: https://review.openstack.org/403988
Committed: https://git.openstack.org/cgit/openstack/devstack-plugin-ceph/commit/?id=9293ac03aba0cabd7148c6a9940ac12c57bef0f1
Submitter: Jenkins
Branch: master

commit 9293ac03aba0cabd7148c6a9940ac12c57bef0f1
Author: Matt Riedemann <email address hidden>
Date: Mon Nov 28 21:52:42 2016 -0500

    Create backing disk using $VOLUME_BACKING_FILE_SIZE

    The backing disk currently created is 8GB. devstack-gate
    sets that to 24GB when running Tempest. We're seeing ceph
    job failures due to NoValidHost in the cinder scheduler
    because 8GB isn't enough capacity for Tempest runs. So this
    change uses the same backing disk size for the ceph jobs as
    we get in the default devstack setup, which uses LVM.

    Depends-On: I71be308c8373e9ac429b901c374100c6b3c1e59d

    Change-Id: I788eefa6c1d427bf51d2d3d40be4abe0336443e7
    Closes-Bug: #1645530

Changed in devstack-plugin-ceph:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.