Brief Description
-----------------
Host lock failed as nfv-vim was hanging and thus did not process the request from sysinv.
Severity
--------
Major
Steps to Reproduce
------------------
Bring up an AIOSX
Apply stx-openstack
Provision quotas, images, networks, etc...
Launch some VMs
Delete and relaunch VMs. This process was repeated several times using different images
Lock host in order to allocate more disk to nova-local
Expected Behavior
------------------
Host was successfully locked
Actual Behavior
----------------
Host lock failed as notification message from sysinv to vim was not acknowledged and processed.
Based on nfv-vim log, the process crashed days ago while trying to its database
2020-01-28T00:49:41.253 controller-0 VIM_Thread[632016] ERROR (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (sqlite3.OperationalError) unable to open database file [SQL: u'UPDATE hosts_v6 SET uptime=?, elapsed_time_in_state=? WHERE hosts_v6.uuid = ?'] [parameters: (283085, 277980, u'6765bfd1-c298-4b05-8ed9-9902d4c80f70')]
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/nfv_vim/vim.py", line 167, in process_main
File "/usr/lib64/python2.7/site-packages/nfv_common/selobj/_selobj_module.py", line 105, in selobj_dispatch
File "/usr/lib64/python2.7/site-packages/nfv_common/tasks/_task_scheduler.py", line 332, in task_work_complete
File "/usr/lib64/python2.7/site-packages/nfv_common/tasks/_task.py", line 193, in task_work_complete
File "/usr/lib64/python2.7/site-packages/nfv_plugins/nfvi_plugins/nfvi_infrastructure_api.py", line 495, in get_hosts
File "/usr/lib64/python2.7/site-packages/nfv_vim/audits/_vim_nfvi_audits.py", line 148, in _audit_nfvi_hosts_callback
File "/usr/lib64/python2.7/site-packages/nfv_vim/objects/_host.py", line 721, in nfvi_host_update
File "/usr/lib64/python2.7/site-packages/nfv_vim/objects/_host.py", line 863, in _persist
File "/usr/lib64/python2.7/site-packages/nfv_vim/database/_database_infrastructure_module.py", line 68, in database_host_add
File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2755, in first
File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2547, in __getitem__
File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2854, in __iter__
File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1407, in _autoflush
File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1397, in _autoflush
File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2171, in flush
File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2291, in _flush
File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2255, in _flush
File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 389, in execute
File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 548, in execute
File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 177, in save_obj
File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 737, in _emit_update_statements
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 945, in execute
File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1402, in _handle_dbapi_exception
File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
OperationalError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (sqlite3.OperationalError) unable to open database file [SQL: u'UPDATE hosts_v6 SET uptime=?, elapsed_time_in_state=? WHERE hosts_v6.uuid = ?'] [parameters: (283085, 277980, u'6765bfd1-c298-4b05-8ed9-9902d4c80f70')]
When vim crashes it is supposed to shutdown its threads and exit.
The code is in nfv_vim/vim.py and looks like this:
finally: open(PROCESS_NOT_RUNNING_FILE, 'w').close()
# Allow up to 10 seconds for the process to shut down. If the
# process_finalize hangs, we will do a hard exit. signal.signal(signal.SIGALRM, _force_exit) signal.alarm(10) process_finalize()
What could happen was the open hung due to lack of file descriptors so the failsafe mechanism was never activated.
The output of lsof (see attached) shows that most of the open files look something like this
nfv-vim 632016 root DEL REG 0,18 107932251 /dev/shm/sem.SVb1Kq
/dev/shm/sem.* would correspond to a named POSIX semaphore opened via the sem_open() library call. It seems likely that a code path is not calling sem_close() (and maybe sem_unlink())when it should be.
Reproducibility
---------------
Seen once
System Configuration
--------------------
IPv4 StarlingX with OpenStack
Brief Description
-----------------
Host lock failed as nfv-vim was hanging and thus did not process the request from sysinv.
Severity
--------
Major
Steps to Reproduce
------------------
Bring up an AIOSX
Apply stx-openstack
Provision quotas, images, networks, etc...
Launch some VMs
Delete and relaunch VMs. This process was repeated several times using different images
Lock host in order to allocate more disk to nova-local
Expected Behavior
------------------
Host was successfully locked
Actual Behavior
----------------
Host lock failed as notification message from sysinv to vim was not acknowledged and processed.
Based on nfv-vim log, the process crashed days ago while trying to its database
2020-01- 28T00:49: 41.253 controller-0 VIM_Thread[632016] ERROR (raised as a result of Query-invoked autoflush; consider using a session. no_autoflush block if this flush is occurring prematurely) (sqlite3. OperationalErro r) unable to open database file [SQL: u'UPDATE hosts_v6 SET uptime=?, elapsed_ time_in_ state=? WHERE hosts_v6.uuid = ?'] [parameters: (283085, 277980, u'6765bfd1- c298-4b05- 8ed9-9902d4c80f 70')] python2. 7/site- packages/ nfv_vim/ vim.py" , line 167, in process_main python2. 7/site- packages/ nfv_common/ selobj/ _selobj_ module. py", line 105, in selobj_dispatch python2. 7/site- packages/ nfv_common/ tasks/_ task_scheduler. py", line 332, in task_work_complete python2. 7/site- packages/ nfv_common/ tasks/_ task.py" , line 193, in task_work_complete python2. 7/site- packages/ nfv_plugins/ nfvi_plugins/ nfvi_infrastruc ture_api. py", line 495, in get_hosts python2. 7/site- packages/ nfv_vim/ audits/ _vim_nfvi_ audits. py", line 148, in _audit_ nfvi_hosts_ callback python2. 7/site- packages/ nfv_vim/ objects/ _host.py" , line 721, in nfvi_host_update python2. 7/site- packages/ nfv_vim/ objects/ _host.py" , line 863, in _persist python2. 7/site- packages/ nfv_vim/ database/ _database_ infrastructure_ module. py", line 68, in database_host_add python2. 7/site- packages/ sqlalchemy/ orm/query. py", line 2755, in first python2. 7/site- packages/ sqlalchemy/ orm/query. py", line 2547, in __getitem__ python2. 7/site- packages/ sqlalchemy/ orm/query. py", line 2854, in __iter__ python2. 7/site- packages/ sqlalchemy/ orm/session. py", line 1407, in _autoflush python2. 7/site- packages/ sqlalchemy/ util/compat. py", line 203, in raise_from_cause python2. 7/site- packages/ sqlalchemy/ orm/session. py", line 1397, in _autoflush python2. 7/site- packages/ sqlalchemy/ orm/session. py", line 2171, in flush python2. 7/site- packages/ sqlalchemy/ orm/session. py", line 2291, in _flush python2. 7/site- packages/ sqlalchemy/ util/langhelper s.py", line 66, in __exit__ python2. 7/site- packages/ sqlalchemy/ orm/session. py", line 2255, in _flush python2. 7/site- packages/ sqlalchemy/ orm/unitofwork. py", line 389, in execute python2. 7/site- packages/ sqlalchemy/ orm/unitofwork. py", line 548, in execute python2. 7/site- packages/ sqlalchemy/ orm/persistence .py", line 177, in save_obj python2. 7/site- packages/ sqlalchemy/ orm/persistence .py", line 737, in _emit_update_ statements python2. 7/site- packages/ sqlalchemy/ engine/ base.py" , line 945, in execute python2. 7/site- packages/ sqlalchemy/ sql/elements. py", line 263, in _execute_ on_connection python2. 7/site- packages/ sqlalchemy/ engine/ base.py" , line 1053, in _execute_ clauseelement python2. 7/site- packages/ sqlalchemy/ engine/ base.py" , line 1189, in _execute_context python2. 7/site- packages/ sqlalchemy/ engine/ base.py" , line 1402, in _handle_ dbapi_exception python2. 7/site- packages/ sqlalchemy/ util/compat. py", line 203, in raise_from_cause python2. 7/site- packages/ sqlalchemy/ engine/ base.py" , line 1182, in _execute_context python2. 7/site- packages/ sqlalchemy/ engine/ default. py", line 470, in do_execute no_autoflush block if this flush is occurring prematurely) (sqlite3. OperationalErro r) unable to open database file [SQL: u'UPDATE hosts_v6 SET uptime=?, elapsed_ time_in_ state=? WHERE hosts_v6.uuid = ?'] [parameters: (283085, 277980, u'6765bfd1- c298-4b05- 8ed9-9902d4c80f 70')]
Traceback (most recent call last):
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
File "/usr/lib64/
OperationalError: (raised as a result of Query-invoked autoflush; consider using a session.
When vim crashes it is supposed to shutdown its threads and exit.
The code is in nfv_vim/vim.py and looks like this:
finally:
open(PROCESS_ NOT_RUNNING_ FILE, 'w').close()
signal. signal( signal. SIGALRM, _force_exit)
signal. alarm(10)
process_ finalize( )
# Allow up to 10 seconds for the process to shut down. If the
# process_finalize hangs, we will do a hard exit.
What could happen was the open hung due to lack of file descriptors so the failsafe mechanism was never activated.
The output of lsof (see attached) shows that most of the open files look something like this
nfv-vim 632016 root DEL REG 0,18 107932251 /dev/shm/sem.SVb1Kq
/dev/shm/sem.* would correspond to a named POSIX semaphore opened via the sem_open() library call. It seems likely that a code path is not calling sem_close() (and maybe sem_unlink())when it should be.
Reproducibility
---------------
Seen once
System Configuration ------- ------
-------
IPv4 StarlingX with OpenStack
Branch/Pull Time/Commit ------- ------- --
-------
Jan. 21st master
Last Pass
---------
Not sure
Timestamp/Logs
--------------
See nfv-vim.log, lsof_output, sysinv.log attached
Test Activity
-------------
Evaluation
Workaround
----------
kill nfv-vim process