OpenStack Compute (nova)

Comment 0 for bug 1714248

Revision history for this message

Hironori Shiina (shiina-hironori) wrote on 2017-08-31:

Description
===========
In an environment where there are multiple compute nodes with ironic driver,
when a compute node goes down, another compute node cannot take over ironic nodes.

Steps to reproduce
==================
1. Start multiple compute nodes with ironic driver.
2. Register one node to ironic.
2. Stop a compute node which manages the ironic node.
3. Create an instance.

Expected result
===============
The instance creation is failed.

Actual result
=============
The instance is created.

Environment
===========
1. Exact version of OpenStack you are running.
openstack-nova-scheduler-15.0.6-2.el7.noarch
openstack-nova-console-15.0.6-2.el7.noarch
python2-novaclient-7.1.0-1.el7.noarch
openstack-nova-common-15.0.6-2.el7.noarch
openstack-nova-serialproxy-15.0.6-2.el7.noarch
openstack-nova-placement-api-15.0.6-2.el7.noarch
python-nova-15.0.6-2.el7.noarch
openstack-nova-novncproxy-15.0.6-2.el7.noarch
openstack-nova-api-15.0.6-2.el7.noarch
openstack-nova-conductor-15.0.6-2.el7.noarch

2. Which hypervisor did you use?
ironic

Details
=======
When a nova-compute goes down, another nova-compute will take over ironic nodes managed by the failed nova-compute by re-balancing a hash-ring. Then the active nova-compute tries creating a
new resource provider with a new ComputeNode object UUID and the hypervisor name (ironic node name)[1][2][3]. This creation fails with a conflict(409) since there is a resource provider with the same name created by the failed nova-compute.

When a new instance is requested, the scheduler gets only an old resource provider for the ironic node[4]. Then, the ironic node is not selected:

WARNING nova.scheduler.filters.compute_filter [req-a37d68b5-7ab1-4254-8698-502304607a90 7b55e61a07304f9cab1544260dcd3e41 e21242f450d948d7af2650ac9365ee36 - - -] (compute02, 8904aeeb-a35b-4ba3-848a-73269fdde4d3) ram: 4096MB disk: 849920MB io_ops: 0 instances: 0 has not been heard from in a while

[1] https://github.com/openstack/nova/blob/stable/ocata/nova/compute/resource_tracker.py#L464
[2] https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/client/report.py#L630
[3] https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/client/report.py#L410
[4] https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/filter_scheduler.py#L183

Description
===========
In an environment where there are multiple compute nodes with ironic driver, 
when a compute node goes down, another compute node cannot take over ironic nodes.

Steps to reproduce
==================
1. Start multiple compute nodes with ironic driver.
2. Register one node to ironic.
2. Stop a compute node which manages the ironic node.
3. Create an instance.

Expected result
===============
The instance creation is failed.

Actual result
=============
The instance is created.

2. Which hypervisor did you use?
ironic

When a new instance is requested, the scheduler gets only an old resource provider for the ironic node[4]. Then, the ironic node is not selected: