Nova Cell is not setup correctly in Packstack ocata release

Bug #1701032 reported by jethro.sun on 2017-06-28
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Packstack
Undecided
Unassigned
Rally
Undecided
Unassigned

Bug Description

Starting from Ocata release, Nova is supposed to have the Cells https://docs.openstack.org/developer/nova/cells.html#cells-v2
setup by default.

It is not currently taken care of thus many nova operations are not able to use. For example, most of my nova workload will be failing because of this cell setup.

Some Error msgs:
==================

GetResourceErrorStatus: Resource <Server: s_rally_2b7dd6bd_xqmrFaRK> has ERROR status.
Fault: {u'message': u"Host 'demo' is not mapped to any cell", u'code': 400, u'created': u'2017-06-28T01:53:05Z'}

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/rally/task/runner.py", line 72, in _run_scenario_once
    getattr(scenario_inst, method_name)(**scenario_kwargs)
  File "/usr/lib/python2.7/site-packages/rally/plugins/openstack/scenarios/nova/servers.py", line 415, in run
    server = self._boot_server(image, flavor, **boot_server_kwargs)
  File "/usr/lib/python2.7/site-packages/rally/task/atomic.py", line 84, in func_atomic_actions
    f = func(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/rally/plugins/openstack/scenarios/nova/utils.py", line 146, in _boot_server
    check_interval=CONF.benchmark.nova_server_boot_poll_interval
  File "/usr/lib/python2.7/site-packages/rally/task/utils.py", line 214, in wait_for_status
    resource = update_resource(resource)
  File "/usr/lib/python2.7/site-packages/rally/task/utils.py", line 90, in _get_from_manager
    fault=getattr(res, "fault", "n/a"))
GetResourceErrorStatus: Resource <Server: s_rally_2b7dd6bd_xqmrFaRK> has ERROR status.
Fault: {u'message': u"Host 'demo' is not mapped to any cell", u'code': 400, u'created': u'2017-06-28T01:53:05Z'}

Alfredo Moralejo (amoralej) wrote :

could you upload packstack answers file and following log files:

/var/tmp/packstack/latest/*log
/var/log/nova/*

Is this a allinone installation?

Note that packstack does the cell_v2 discover command since ocata, there may be some issue with ordering or similar?,

Finally, could you paste the output of command:

nova service-list

jethro.sun (jethro-sun7) wrote :

Hi Alfredo,

Yes it is a allinone installation. I think I have tried to manually play with the cell_v2 command but no luck, it should work out of the box right?

The nova service-list output is:

[root@cluster centos(keystone_admin)]# nova service-list

/usr/lib/python2.7/site-packages/novaclient/client.py:278: UserWarning: The 'tenant_id' argument is deprecated in Ocata and its use may result in errors in future releases. As 'project_id' is provided, the 'tenant_id' argument will be ignored.
  warnings.warn(msg)
+----+------------------+------------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----+------------------+------------------+----------+---------+-------+----------------------------+-----------------+
| 8 | nova-cert | cluster | internal | enabled | down | 2017-06-28T13:46:21.000000 | - |
| 9 | nova-conductor | cluster | internal | enabled | down | 2017-06-28T13:46:21.000000 | - |
| 14 | nova-scheduler | cluster | internal | enabled | down | 2017-06-28T13:46:15.000000 | - |
| 15 | nova-consoleauth | cluster | internal | enabled | down | 2017-06-28T13:46:21.000000 | - |
| 16 | nova-compute | cluster | nova | enabled | down | 2017-06-28T13:46:15.000000 | - |
| 17 | nova-conductor | cluster.moclocal | internal | enabled | up | 2017-06-29T12:55:38.000000 | - |
| 19 | nova-consoleauth | cluster.moclocal | internal | enabled | up | 2017-06-29T12:55:38.000000 | - |
| 20 | nova-cert | cluster.moclocal | internal | enabled | up | 2017-06-29T12:55:31.000000 | - |
| 21 | nova-scheduler | cluster.moclocal | internal | enabled | up | 2017-06-29T12:55:31.000000 | - |
| 22 | nova-compute | cluster.moclocal | nova | enabled | up | 2017-06-29T12:55:37.000000 | - |
+----+------------------+------------------+----------+---------+-------+----------------------------+-----------------+

launch pad wont allow me to upload multiple files so I hosted them in my github repo, see https://github.com/shwsun/xavier-contrib/blob/master/packstack/bug-report/

Alfredo Moralejo (amoralej) wrote :

Hi,

Yes, it should work out of the box.

There may be multiple problems in your case. One thing i see is that the hostname is changing after reboot from cluster to cluster-moclocal, according to the output of service-list. This is a problem and you should fix it. When packstack run, node was cluster, and that was the name assigned to the compute node and (if everything went fine) added to the cell. After reboot, node name changed to cluster.moclocal and from nova point of view, it's a different node.

At this point, running:

nova-manage cell_v2 discover_hosts

should add cluster.moclocal to the cell and start working fine.

Regarding what happened on first run, i'm not sure if it was fine, could you paste the output of:

grep nova-cell_v2-discover_hosts /var/tmp/packstack/latest/manifest/*log

jethro.sun (jethro-sun7) wrote :

Hi,

You are absolutely right. I noticed that the bare-metal node is working fine and I am only hitting these problems on VM spawned in Openstack. It turned out the hostname I specified in openstack is cluster, but in /etc/hostname it goes cluster.moclocal. I will need to run this `nova-manage cell_v2 discover_hosts` manually to resolve it.

Is there a place to document this finding or you think it is some sort a bug? I would say that I am super happy this is resolved.

zhangzhihui (zhangzhang) wrote :

hi, if your problem is resolved, can you run rally again? i want to confirm whether it is bug of rally.

jethro.sun (jethro-sun7) wrote :

Hi,

Yes the nova cell problem is solved by `nova-manage cell_v2 discover_hosts`

As I explained, the VM spawned by openstack itself will add something else to the hostname and that is the root cause of this.

jethro.sun (jethro-sun7) wrote :

Hi all,

So it seems like the problem is not going away yet. I saw some error msgs like:

```
--------------------------------------------------------------------------------
GetResourceErrorStatus: Resource <Server: s_rally_0624f299_BxFCTGAr> has ERROR status.
Fault: {u'message': u'No valid host was found. There are not enough hosts available.', u'code': 500, u'created': u'2017-07-05T02:51:44Z'}

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/rally/task/runner.py", line 72, in _run_scenario_once
    getattr(scenario_inst, method_name)(**scenario_kwargs)
  File "/usr/lib/python2.7/site-packages/rally/plugins/openstack/scenarios/nova/servers.py", line 869, in run
    server = self._boot_server(image, flavor, **kwargs)
  File "/usr/lib/python2.7/site-packages/rally/task/atomic.py", line 84, in func_atomic_actions
    f = func(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/rally/plugins/openstack/scenarios/nova/utils.py", line 146, in _boot_server
    check_interval=CONF.benchmark.nova_server_boot_poll_interval
  File "/usr/lib/python2.7/site-packages/rally/task/utils.py", line 214, in wait_for_status
    resource = update_resource(resource)
  File "/usr/lib/python2.7/site-packages/rally/task/utils.py", line 90, in _get_from_manager
    fault=getattr(res, "fault", "n/a"))
GetResourceErrorStatus: Resource <Server: s_rally_0624f299_BxFCTGAr> has ERROR status.
Fault: {u'message': u'No valid host was found. There are not enough hosts available.', u'code': 500, u'created': u'2017-07-05T02:51:44Z'}
```

And according to some Q&A it seems that the nova cell is not configured correctly. Any suggestions? I think simply changing the hostname to what it supposed to be can be a temporary solution.

openstack Q&A reference:
https://ask.openstack.org/en/question/103932/novalidhost-no-valid-host-was-found-there-are-not-enough-hosts-available/

Boris Pavlovic (boris-42) wrote :

This is not Rally problem, none code changes in Rally required.

Changed in rally:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers