Brief Description
-----------------
locking compute host unexpectedly fails vim-services on the active controller
(Appears to be an OpenStack Rest-API Exception with the L3/DHCP Agent rescheduling)
Severity
--------
standard
Steps to Reproduce
------------------
1. controller-0 and controller-1 enabled online state (controller-0 is the active controller)
worker nodes (compute-0, compute-1 and compute-2) all enabled and online
2. lock eg. compute-2 and check vim-services state
Expected Behavior
------------------
locking compute host should not fail the vim-services
Actual Behavior
----------------
vim service failed unexpectedly on the active controller when locking compute node
2019-04-09T20:44:51.113687 log 200.022 compute-2 is now 'disabled' host=compute-2.state=disabled not-applicable
...
2019-04-09T20:44:51.572559 set 200.001 compute-2 was administratively locked to take it out-of-service. host=compute-2 warning
...
2019-04-09T20:46:58.680853 log 401.001 Service group vim-services state change from active to active-failed on host controller-0; vim(disabled, failed) service_domain=controller.service_group=vim-services.host=controller-0 critical
vim-services disabled-failed active-failed
see var/log/sm-customer.log (from active controller-0)
| 2019-04-09T20:45:30.240 | 344 | service-scn | vim | enabled-active | disabled | process (pid=72780) failed
| 2019-04-09T20:45:30.706 | 345 | service-scn | vim | disabled | enabling | enabled-active state requested
| 2019-04-09T20:45:30.800 | 346 | service-scn | vim | enabling | enabled-active | enable success
| 2019-04-09T20:46:58.313 | 347 | service-scn | vim | enabled-active | disabled | process (pid=221031) failed
| 2019-04-09T20:46:58.680 | 348 | service-scn | vim | disabled | disabled-failed | enabled-active state requested
| 2019-04-09T20:46:58.680 | 349 | service-group-scn | vim-services | active | active-failed | vim(disabled, failed)
| 2019-04-09T20:46:59.687 | 350 | service-group-scn | vim-services | active-failed | disabling-failed | vim(disabled, failed)
| 2019-04-09T20:46:59.688 | 351 | service-group-scn | cloud-services | active | disabling |
| 2019-04-09T20:46:59.689 | 352 | service-group-scn | controller-services | active | disabling |
| 2019-04-09T20:46:59.690 | 353 | service-group-scn | oam-services | active | disabling |
Reproducibility
---------------
yes
System Configuration
--------------------
2+3
(Lab yow-cgcs-wolfpass-8-12)
Branch/Pull Time/Commit
--------------------
BUILD_ID="20190408T233001Z"
Last Pass
---------
Timestamp/Logs
--------------
Issue appears to be with the L3/DHCP Agent rescheduling.
see /var/log/nfv-vim.log (attached) on controller-0 from the time of the lock
2019-04-09T20:44:09.122 controller-0 VIM_Thread[72780] INFO _network_rebalance.py.1065 Triggering L3 Agent reschedule for disabled l3 agent host: compute-2
2019-04-09T20:44:09.122 controller-0 VIM_Thread[72780] INFO _dhcp_rebalance.py.866 Triggering DHCP Agent reschedule for disabled dhcp agent host: compute-2
2019-04-09T20:44:15.477 controller-0 VIM_Network-Worker-0_Thread[160705] INFO _timer_scheduler.py.57 Not scheduling on time, elapsed=122412 ms.
2019-04-09T20:44:15.478 controller-0 VIM_Network-Worker-0_Thread[160705] INFO _thread.py.218 Thread Network-Worker-0: not scheduling on time
2019-04-09T20:44:15.478 controller-0 VIM_Network-Worker-0_Thread[160705] INFO _thread.py.218 Thread Network-Worker-0: not scheduling on time
2019-04-09T20:44:15.478 controller-0 VIM_Network-Worker-0_Thread[160705] INFO _thread.py.232 Thread Network-Worker-0: shutting down.
2019-04-09T20:44:15.478 controller-0 VIM_Network-Worker-0_Thread[160705] INFO _thread.py.236 Thread Network-Worker-0: shutdown.
2019-04-09T20:44:18.578 controller-0 VIM_Thread[72780] ERROR Caught exception while trying to add network to agent, error=[OpenStack Rest-API Exception: method=POST, url=http://neutron-server.openstack.svc.cluster.local:9696//v2.0/agents/b8a4a817-3f90-41b7-b846-e635b5304cac/dhcp-networks, headers={'Content-Type': 'application/json'}, body={"network_id": "3a0da180-b80c-4d94-b7cb-99292ff469f6"}, status_code=404, reason=HTTP Error 404: Not Found, response_headers=[('Date', 'Tue, 09 Apr 2019 20:44:18 GMT'), ('Content-Length', '165'), ('Content-Type', 'application/json'), ('X-Openstack-Request-Id', 'req-8b7b7cad-6918-40e4-bdf5-3c51bb94cbf0')], response_body={"NeutronError": {"message": "Agent b8a4a817-3f90-41b7-b846-e635b5304cac is not a valid DHCP Agent or has been disabled.", "type": "InvalidDHCPAgent", "detail": ""}}].
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/nfv_plugins/nfvi_plugins/nfvi_network_api.py", line 1518, in add_network_to_dhcp_agent
future.result = (yield)
OpenStackRestAPIException: [OpenStack Rest-API Exception: method=POST, url=http://neutron-server.openstack.svc.cluster.local:9696//v2.0/agents/b8a4a817-3f90-41b7-b846-e635b5304cac/dhcp-networks, headers={'Content-Type': 'application/json'}, body={"network_id": "3a0da180-b80c-4d94-b7cb-99292ff469f6"}, status_code=404, reason=HTTP Error 404: Not Found, response_headers=[('Date', 'Tue, 09 Apr 2019 20:44:18 GMT'), ('Content-Length', '165'), ('Content-Type', 'application/json'), ('X-Openstack-Request-Id', 'req-8b7b7cad-6918-40e4-bdf5-3c51bb94cbf0')], response_body={"NeutronError": {"message": "Agent b8a4a817-3f90-41b7-b846-e635b5304cac is not a valid DHCP Agent or has been disabled.", "type": "InvalidDHCPAgent", "detail": ""}}]
2019-04-09T20:44:18.579 controller-0 VIM_Thread[72780] WARNING _dhcp_rebalance.py.390 Unable to add network to dhcp agent, response = {'completed': False, 'reason': '', 'result-data': ''}
also controller-1 nfv-vim.log
2019-04-09T20:43:57.778 controller-0 VIM_Thread[72780] INFO _host_state_disabling.py.27 Entering state (disabling) for compute-2.
2019-04-09T20:43:57.782 controller-0 VIM_Thread[72780] INFO _host.py.839 Host compute-2 FSM State-Change: prev_state=enabled, state=disabling, event=task-completed.
2019-04-09T20:43:57.784 controller-0 VIM_Thread[72780] INFO _host_director.py.475 Host compute-2 state change notification.
2019-04-09T20:43:57.811 controller-0 VIM_Event-Log_Thread[72818] INFO fm.py.379 Generated customer log, fm_uuid=6c380c47-ff5d-4e34-80ce-7e4977e51f21.
2019-04-09T20:43:57.892 controller-0 VIM_Thread[72780] INFO _hypervisor.py.133 Hypervisor compute-2 state change was unlocked-enabled now locked-enabled
2019-04-09T20:43:57.892 controller-0 VIM_Thread[72780] INFO _host_director.py.395 Notify other directors that the host compute-2 services are disabling.
2019-04-09T20:43:57.892 controller-0 VIM_Thread[72780] INFO _instance_director.py.1339 Host compute-2 services disabling.
2019-04-09T20:43:57.935 controller-0 VIM_Event-Log_Thread[72818] INFO fm.py.379 Generated customer log, fm_uuid=f7f39587-6445-413b-8fb8-0099f475f361.
2019-04-09T20:43:58.033 controller-0 VIM_Thread[72780] INFO _host_director.py.408 Notify other directors that the host compute-2 services are disabled.
2019-04-09T20:43:58.034 controller-0 VIM_Thread[72780] INFO _instance_director.py.1392 Host compute-2 services disabled.
2019-04-09T20:43:58.034 controller-0 VIM_Thread[72780] DEBUG _instance_director.py.1408 Canceling previous host operation host-disable, before continuing with host operation host-disable for compute-2.
2019-04-09T20:43:58.035 controller-0 VIM_Thread[72780] INFO _host_task_work.py.194 Queueing rebalance for host compute-2 disable
2019-04-09T20:43:58.119 controller-0 VIM_Thread[72780] INFO _hypervisor.py.133 Hypervisor compute-2 state change was locked-enabled now locked-disabled
2019-04-09T20:43:58.120 controller-0 VIM_Thread[72780] INFO _host_director.py.421 Notify other directors that the host compute-2 is disabled.
2019-04-09T20:43:58.120 controller-0 VIM_Thread[72780] INFO _instance_director.py.1427 Host compute-2 disabled.
2019-04-09T20:43:58.120 controller-0 VIM_Thread[72780] DEBUG _instance_director.py.1432 Canceling host operation host-disable for compute-2.
2019-04-09T20:43:58.121 controller-0 VIM_Thread[72780] DEBUG _host_tasks.py.287 Task (disable-host_compute-2) complete.
2019-04-09T20:43:58.121 controller-0 VIM_Thread[72780] INFO _host_state_disabling.py.36 Exiting state (disabling) for compute-2.
2019-04-09T20:43:58.121 controller-0 VIM_Thread[72780] INFO _host_state_disabled.py.29 Entering state (disabled) for compute-2.
2019-04-09T20:43:58.123 controller-0 VIM_Thread[72780] INFO _host.py.839 Host compute-2 FSM State-Change: prev_state=disabling, state=disabled, event=task-completed.
2019-04-09T20:43:58.125 controller-0 VIM_Thread[72780] INFO _host_director.py.475 Host compute-2 state change notification.
2019-04-09T20:43:58.161 controller-0 VIM_Event-Log_Thread[72818] INFO fm.py.379 Generated customer log, fm_uuid=14df24df-39a7-4770-bff4-34a51c7b4f90.
2019-04-09T20:44:07.959 controller-0 VIM_Thread[72780] INFO _vim_nfvi_audits.py.873 Audit instances called, timer_id=15.
2019-04-09T20:44:09.122 controller-0 VIM_Thread[72780] INFO _network_rebalance.py.1065 Triggering L3 Agent reschedule for disabled l3 agent host: compute-2
2019-04-09T20:44:09.122 controller-0 VIM_Thread[72780] INFO _dhcp_rebalance.py.866 Triggering DHCP Agent reschedule for disabled dhcp agent host: compute-2
2019-04-09T20:44:15.477 controller-0 VIM_Network-Worker-0_Thread[160705] INFO _timer_scheduler.py.57 Not scheduling on time, elapsed=122412 ms.
2019-04-09T20:44:15.478 controller-0 VIM_Network-Worker-0_Thread[160705] INFO _thread.py.218 Thread Network-Worker-0: not scheduling on time
2019-04-09T20:44:15.478 controller-0 VIM_Network-Worker-0_Thread[160705] INFO _thread.py.218 Thread Network-Worker-0: not scheduling on time
2019-04-09T20:44:15.478 controller-0 VIM_Network-Worker-0_Thread[160705] INFO _thread.py.232 Thread Network-Worker-0: shutting down.
2019-04-09T20:44:15.478 controller-0 VIM_Network-Worker-0_Thread[160705] INFO _thread.py.236 Thread Network-Worker-0: shutdown.
2019-04-09T20:44:18.578 controller-0 VIM_Thread[72780] ERROR Caught exception while trying to add network to agent, error=[OpenStack Rest-API Exception: method=POST, url=http://neutron-server.openstack.svc.cluster.local:9696//v2.0/agents/b8a4a817-3f90-41b7-b846-e635b5304cac/dhcp-networks, headers={'Content-Type': 'application/json'}, body={"network_id": "3a0da180-b80c-4d94-b7cb-99292ff469f6"}, status_code=404, reason=HTTP Error 404: Not Found, response_headers=[('Date', 'Tue, 09 Apr 2019 20:44:18 GMT'), ('Content-Length', '165'), ('Content-Type', 'application/json'), ('X-Openstack-Request-Id', 'req-8b7b7cad-6918-40e4-bdf5-3c51bb94cbf0')], response_body={"NeutronError": {"message": "Agent b8a4a817-3f90-41b7-b846-e635b5304cac is not a valid DHCP Agent or has been disabled.", "type": "InvalidDHCPAgent", "detail": ""}}].
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/nfv_plugins/nfvi_plugins/nfvi_network_api.py", line 1518, in add_network_to_dhcp_agent
future.result = (yield)
OpenStackRestAPIException: [OpenStack Rest-API Exception: method=POST, url=http://neutron-server.openstack.svc.cluster.local:9696//v2.0/agents/b8a4a817-3f90-41b7-b846-e635b5304cac/dhcp-networks, headers={'Content-Type': 'application/json'}, body={"network_id": "3a0da180-b80c-4d94-b7cb-99292ff469f6"}, status_code=404, reason=HTTP Error 404: Not Found, response_headers=[('Date', 'Tue, 09 Apr 2019 20:44:18 GMT'), ('Content-Length', '165'), ('Content-Type', 'application/json'), ('X-Openstack-Request-Id', 'req-8b7b7cad-6918-40e4-bdf5-3c51bb94cbf0')], response_body={"NeutronError": {"message": "Agent b8a4a817-3f90-41b7-b846-e635b5304cac is not a valid DHCP Agent or has been disabled.", "type": "InvalidDHCPAgent", "detail": ""}}]
2019-04-09T20:44:18.579 controller-0 VIM_Thread[72780] WARNING _dhcp_rebalance.py.390 Unable to add network to dhcp agent, response = {'completed': False, 'reason': '', 'result-data': ''}
2019-04-09T20:44:21.383 controller-0 VIM_Thread[72780] ERROR Caught exception while trying to add network to agent, error=[OpenStack Rest-API Exception: method=POST, url=http://neutron-server.openstack.svc.cluster.local:9696//v2.0/agents/b8a4a817-3f90-41b7-b846-e635b5304cac/dhcp-networks, headers={'Content-Type': 'application/json'}, body={"network_id": "a8e24402-829a-4002-9824-dc944f0d6480"}, status_code=404, reason=HTTP Error 404: Not Found, response_headers=[('Date', 'Tue, 09 Apr 2019 20:44:21 GMT'), ('Content-Length', '165'), ('Content-Type', 'application/json'), ('X-Openstack-Request-Id', 'req-ba08279b-9712-4213-b952-ad937fab11c9')], response_body={"NeutronError": {"message": "Agent b8a4a817-3f90-41b7-b846-e635b5304cac is not a valid DHCP Agent or has been disabled.", "type": "InvalidDHCPAgent", "detail": ""}}].
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/nfv_plugins/nfvi_plugins/nfvi_network_api.py", line 1518, in add_network_to_dhcp_agent
future.result = (yield)
OpenStackRestAPIException: [OpenStack Rest-API Exception: method=POST, url=http://neutron-server.openstack.svc.cluster.local:9696//v2.0/agents/b8a4a817-3f90-41b7-b846-e635b5304cac/dhcp-networks, headers={'Content-Type': 'application/json'}, body={"network_id": "a8e24402-829a-4002-9824-dc944f0d6480"}, status_code=404, reason=HTTP Error 404: Not Found, response_headers=[('Date', 'Tue, 09 Apr 2019 20:44:21 GMT'), ('Content-Length', '165'), ('Content-Type', 'application/json'), ('X-Openstack-Request-Id', 'req-ba08279b-9712-4213-b952-ad937fab11c9')], response_body={"NeutronError": {"message": "Agent b8a4a817-3f90-41b7-b846-e635b5304cac is not a valid DHCP Agent or has been disabled.", "type": "InvalidDHCPAgent", "detail": ""}}]
2019-04-09T20:44:21.383 controller-0 VIM_Thread[72780] WARNING _dhcp_rebalance.py.390 Unable to add network to dhcp agent, response = {'completed': False, 'reason': '', 'result-data': ''}
2019-04-09T20:44:24.963 controller-0 VIM_Thread[72780] ERROR Caught exception while trying to add router to agent, error=[OpenStack Rest-API Exception: method=POST, url=http://neutron-server.openstack.svc.cluster.local:9696//v2.0/agents/0342dcea-faee-4e13-beb3-41127ac33895/l3-routers, headers={'Content-Type': 'application/json'}, body={"router_id": "48fc7a78-e597-447d-9028-89c98c3f793a"}, status_code=404, reason=HTTP Error 404: Not Found, response_headers=[('Date', 'Tue, 09 Apr 2019 20:44:24 GMT'), ('Content-Length', '154'), ('Content-Type', 'application/json'), ('X-Openstack-Request-Id', 'req-1b5f2570-dfdb-45c6-838e-0399c744aab2')], response_body={"NeutronError": {"message": "Agent 0342dcea-faee-4e13-beb3-41127ac33895 is not a L3 Agent or has been disabled", "type": "InvalidL3Agent", "detail": ""}}].
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/nfv_plugins/nfvi_plugins/nfvi_network_api.py", line 1444, in add_router_to_agent
future.result = (yield)
OpenStackRestAPIException: [OpenStack Rest-API Exception: method=POST, url=http://neutron-server.openstack.svc.cluster.local:9696//v2.0/agents/0342dcea-faee-4e13-beb3-41127ac33895/l3-routers, headers={'Content-Type': 'application/json'}, body={"router_id": "48fc7a78-e597-447d-9028-89c98c3f793a"}, status_code=404, reason=HTTP Error 404: Not Found, response_headers=[('Date', 'Tue, 09 Apr 2019 20:44:24 GMT'), ('Content-Length', '154'), ('Content-Type', 'application/json'), ('X-Openstack-Request-Id', 'req-1b5f2570-dfdb-45c6-838e-0399c744aab2')], response_body={"NeutronError": {"message": "Agent 0342dcea-faee-4e13-beb3-41127ac33895 is not a L3 Agent or has been disabled", "type": "InvalidL3Agent", "detail": ""}}]
2019-04-09T20:44:24.963 controller-0 VIM_Thread[72780] WARNING _network_rebalance.py.486 Unable to add router to l3 agent, response = {'completed': False, 'reason': '', 'result-data': ''}
2019-04-09T20:44:50.707 controller-0 VIM_Thread[72780] DEBUG _vim_nfvi_events.py.63 Host action, host_uuid=4fe5d32b-37e5-4a6c-938c-2f63df0d6dbc, host_name=compute-2, do_action=lock.\
...
2019-04-09T20:45:15.651 controller-0 VIM_Thread[72780] INFO _network_rebalance.py.1065 Triggering L3 Agent reschedule for disabled l3 agent host: compute-1
2019-04-09T20:45:15.652 controller-0 VIM_Thread[72780] INFO _dhcp_rebalance.py.866 Triggering DHCP Agent reschedule for disabled dhcp agent host: compute-1
2019-04-09T20:45:20.138 controller-0 VIM_Thread[72780] ERROR list index out of range
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/nfv_vim/vim.py", line 168, in process_main
timers.timers_schedule()
File "/usr/lib64/python2.7/site-packages/nfv_common/timers/_timer_module.py", line 88, in timers_schedule
_scheduler.schedule()
File "/usr/lib64/python2.7/site-packages/nfv_common/timers/_timer_scheduler.py", line 75, in schedule
rearm = timer.callback(now_ms)
File "/usr/lib64/python2.7/site-packages/nfv_common/timers/_timer.py", line 68, in callback
self._callback.send(self._timer_id)
File "/usr/lib64/python2.7/site-packages/nfv_vim/network_rebalance/_dhcp_rebalance.py", line 807, in _dr_timer
_run_state_machine()
File "/usr/lib64/python2.7/site-packages/nfv_vim/network_rebalance/_dhcp_rebalance.py", line 829, in _run_state_machine
_get_networks_on_agents()
File "/usr/lib64/python2.7/site-packages/nfv_vim/network_rebalance/_dhcp_rebalance.py", line 537, in _get_networks_on_agents
agent_id, host_name = _DHCPRebalance.get_current_dhcp_agent()
File "/usr/lib64/python2.7/site-packages/nfv_vim/network_rebalance/_dhcp_rebalance.py", line 111, in get_current_dhcp_agent
agent_id = self.dhcp_agents[self.dhcpagent_idx]['id']
IndexError: list index out of range
Test Activity
-------------
[Platform CPU testing]
Identical bugs in populate_ dhcp_agents in _dhcp_rebalance.py and populate_l3agents in _network_ rebalance. py, self.add_agent() should only be called for a down agent (if clause) or alive and admin_state_up agents (elif clause). Testing fix.