OpenStack Compute (nova)

resize or migrate instance will get failed if two compute hosts are set in different rbd pool

Bug #1633990 reported by LIU Yulong on 2016-10-17

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Won't Fix	Undecided	Unassigned

Bug Description

We are now facing a nova operation issue about setting different ceph rbd pool to each corresponding nova compute node in one available zone. For instance:
(1) compute-node-1 in az1 and set images_rbd_pool=pool1
(2) compute-node-2 in az1 and set images_rbd_pool=pool2
This setting can normally work fine.

But problem encountered when doing resize/migrate instance. For instance, when try to resize an instance-1 originally in compute-node-1, then nova will do schedule procedure, assuming that nova-scheduler get the chosen compute node is compute-node-2. Then the nova will get the following error:
http://paste.openstack.org/show/585540/.

This exception is because that in compute-node-2 nova can't find pool1 vm1 disk. So is there a way nova can handle this? Similar thing in cinder, you may see a cinder volume has host attribute like:
host_name@pool_name#ceph.

Why we use such setting is because that while doing storage capacity expansion we want to avoid the influence of ceph rebalance.

One solution we found is AggregateInstanceExtraSpecsFilter, this can coordinate working with Host Aggregates metadata and flavor metadata.
We try to create Host Aggregates like:
az1-pool1 with hosts compute-node-1, and metadata {ceph_pool: pool1};
az1-pool2 with hosts compute-node-2, and metadata {ceph_pool: pool2};
and create flavors like:
flavor1-pool1 with metadata {ceph_pool: pool1};
flavor2-pool1 with metadata {ceph_pool: pool1};
flavor1-pool2 with metadata {ceph_pool: pool2};
flavor2-pool2 with metadata {ceph_pool: pool2};

But this may introduce a new issue about the create_instance. Which flavor should be used? The business/application layer seems need to add it's own flavor scheduler. And this can also
cause a compute service capacity issue. If choice one flavor to resize, the scheduler will use the AggregateInstanceExtraSpecsFilter to limit the destination host to the same rbd pool, what if there is no available compute host? what if there has no enough memory or CPU? So this is not the best solution.

So here finally, I want to ask, if there is a best practice about using multiple ceph rbd pools in one available zone.

See original description

LIU Yulong (dragon889) on 2016-10-17

summary:	- resize or migrate instance will failed if two compute host are set in - different rbd pool + resize or migrate instance will be failed if two compute hosts are set + in different rbd pool
summary:	- resize or migrate instance will be failed if two compute hosts are set + resize or migrate instance will get failed if two compute hosts are set in different rbd pool

Revision history for this message

LIU Yulong (dragon889) wrote on 2016-10-17:

If use cinder to boot the VMs, then the following bug will be an issue:
https://bugs.launchpad.net/nova/+bug/1474253

Revision history for this message

Matt Riedemann (mriedem) wrote on 2016-10-26:

I think this is basically going to have to be fixed with the generic resource providers work that's going on, i.e. you have different sets of aggregates (compute hosts) that are using different shared storage pools and when you migrate an instance using shared storage in one of the hosts then you migrate to another host in that same aggregate.

LIU Yulong (dragon889) on 2016-11-01

description:

updated

Revision history for this message

Sean Dague (sdague) wrote on 2016-12-09:

Until the resource providers work is done, there is no fix for this. Under current architecture this is a Won't Fix

Changed in nova:
status:	New → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.