VIM compute plugin used before initialized

Bug #1840668 reported by Bart Wensley on 2019-08-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Low
Bart Wensley

Bug Description

Brief Description
-----------------
In some cases, the VIM's compute plugin can be used before it has been initialized. This can cause exceptions which cause the VIM to restart. Although recovery is automatic, this behaviour should be cleaned up.

Severity
--------
Minor: This only happens at init time (when the stx-openstack application is installed) and recovery is automatic.

Steps to Reproduce
------------------
Install the stx-openstack application.

Expected Behavior
------------------
When the VIM is restarted after the installation of the stx-openstack application, it should not attempt to use the compute plugin until the plugin has been fully initialized. Initialization can be delayed, because this plugin requires the rabbitmq and nova pods to be running in order to initialize.

Actual Behavior
----------------
In some cases, the VIM will attempt to use the compute plugin before it has been initialized. This would normally be caused by one of the VIM's audits running before the initialization is complete. This will result in exceptions similar to the following:

2019-08-07T20:24:07.560 controller-1 VIM_Thread[2857929] ERROR Caught exception while trying to query controller-0 nova services, error='NoneType' object has no attribute 'get_service_info'.
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/nfv_plugins/nfvi_plugins/nfvi_compute_api.py", line 749, in query_host_services
    if self._host_supports_nova_compute(host_personality):
  File "/usr/lib64/python2.7/site-packages/nfv_plugins/nfvi_plugins/nfvi_compute_api.py", line 386, in _host_supports_nova_compute
    (self._directory.get_service_info(
AttributeError: 'NoneType' object has no attribute 'get_service_info'

2019-08-07T20:24:09.113 controller-1 VIM_Thread[2857929] ERROR Caught exception while trying to get hypervisor list, error=AttributeError: 'NoneType' object has no attribute 'auth_uri'.
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/nfv_plugins/nfvi_plugins/nfvi_compute_api.py", line 951, in get_hypervisors
    future.result = (yield)
Exception: AttributeError: 'NoneType' object has no attribute 'auth_uri'
2019-08-07T20:24:09.113 controller-1 VIM_Thread[2857929] ERROR _vim_nfvi_audits.py.274 Audit-Hypervisors callback, not completed, responses={'completed': False, 'reason': ''}.
2019-08-07T20:24:14.975 controller-1 VIM_Thread[2857929] INFO nfvi_guest_api.py.974 Content={"hostname": "controller-0","uuid":"f6021db1-3abf-488c-8814-647e2040c154"}, len=74
2019-08-07T20:24:14.975 controller-1 VIM_Thread[2857929] DEBUG _vim_nfvi_events.py.332 Host-Services query, host_name=controller-0.
2019-08-07T20:24:14.975 controller-1 VIM_Thread[2857929] DEBUG nfvi_guest_api.py.1003 Host rest-api get path: /nfvi-plugins/v1/hosts/f6021db1-3abf-488c-8814-647e2040c154.
2019-08-07T20:24:21.932 controller-1 VIM_Thread[2857929] ERROR timed out
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/nfv_vim/vim.py", line 206, in process_main
    init_complete = process_reinitialize()
  File "/usr/lib64/python2.7/site-packages/nfv_vim/vim.py", line 100, in process_reinitialize
    if not nfvi.nfvi_reinitialize(config.CONF['nfvi']):
  File "/usr/lib64/python2.7/site-packages/nfv_vim/nfvi/_nfvi_module.py", line 119, in nfvi_reinitialize
    _task_worker_pools['compute'])
  File "/usr/lib64/python2.7/site-packages/nfv_vim/nfvi/_nfvi_compute_module.py", line 464, in nfvi_compute_initialize
    _compute_plugin.initialize(config['config_file'])
  File "/usr/lib64/python2.7/site-packages/nfv_vim/nfvi/_nfvi_plugin.py", line 94, in initialize
    self._plugin.obj.initialize(config_file)
  File "/usr/lib64/python2.7/site-packages/nfv_plugins/nfvi_plugins/nfvi_compute_api.py", line 3393, in initialize
    'notifications.nfvi_nova_listener_queue')
  File "/usr/lib64/python2.7/site-packages/nfv_plugins/nfvi_plugins/openstack/rpc_listener.py", line 42, in __init__
    self._consumer = Consumer(self._connection, self._rpc_receive_queue)
  File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 386, in __init__
    self.revive(self.channel)
  File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 399, in revive
    channel = self.channel = maybe_channel(channel)
  File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 973, in maybe_channel
    return channel.default_channel
  File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 819, in default_channel
    self.connection
  File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 802, in connection
    self._connection = self._establish_connection()
  File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 757, in _establish_connection
    conn = self.transport.establish_connection()
  File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 130, in establish_connection
    conn.connect()
  File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 294, in connect
    self.transport.connect()
  File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 120, in connect
    self._connect(self.host, self.port, self.connect_timeout)
  File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 177, in _connect
    "failed to resolve broker hostname"))
timeout: timed out

Reproducibility
---------------
Intermittent

System Configuration
--------------------
Most likely to be seen on a one or two node system with high CPU load.

Branch/Pull Time/Commit
-----------------------
Load from on or about August 12, 2019.

Last Pass
---------
N/A

Timestamp/Logs
--------------
See above.

Test Activity
-------------
Developer testing (found by Al Bailey)

Changed in starlingx:
assignee: nobody → Bart Wensley (bartwensley)
Bart Wensley (bartwensley) wrote :

Looking at the code, I think the solution is likely going to involve adding a new nfvi_compute_plugin_initialized function to nfv_vim/nfvi/_nfvi_compute_module.py and using that to decide whether operations on this plugin should be attempted.

Ghada Khalil (gkhalil) wrote :

Marking as Low / not gating - code cleanup

tags: added: stx.nfv
Changed in starlingx:
status: New → Triaged
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers