Comment 0 for bug 1833323

Revision history for this message
Brent Rowsell (brent-rowsell) wrote : Openstack manifest apply hung applyinf cinder manifest

Brief Description
-----------------
I was applying the openstack application and it was stuck applying the cinder chart.
Upon investigation it appears the armada container was terminated when sysinv restarted due to SM detecting an audit failure

Manual database recovery was required which is not an acceptable solution. The application framework must gracefully handle process restarts

Severity
--------
Major

Steps to Reproduce
------------------
See above

Expected Behavior
------------------
Application applied without errors

Actual Behavior
----------------
See above

Reproducibility
---------------
Seen once so far

System Configuration
--------------------
AIO-SX, low latency profile

Branch/Pull Time/Commit
-----------------------
"2019-06-03 18:37:28"
Tarball built on June 4th

Last Pass
---------
Same load lineup

Timestamp/Logs
--------------
2019-06-18T18:27:31.000 controller-0 bash: info HISTORY: PID=147344 UID=0 system application-apply stx-openstack
2019-06-18 18:27:32.356 92510 INFO sysinv.conductor.kube_app [-] Application (stx-openstack) apply started.
2019-06-18 18:56:48.881 92510 INFO sysinv.conductor.kube_app [-] processing chart: osh-openstack-cinder, overall completion: 73.0%

| 2019-06-18T18:24:17.497 | 262 | service-group-scn | vim-services | go-active | active |
| 2019-06-18T18:59:44.939 | 263 | service-scn | mgr-restful-plugin | enabled-active | disabling | audit failed
| 2019-06-18T18:59:45.288 | 264 | service-scn | ceph-manager | enabled-active | disabling | disable state requested
| 2019-06-18T18:59:45.290 | 265 | service-scn | sysinv-conductor | enabled-active | disabling | disable state requested
| 2019-06-18T18:59:45.290 | 266 | service-scn | sysinv-inv | enabled-active | disabling | disable state requested

sysinv restarted
2019-06-18 18:59:45.415 92988 INFO oslo_service.service [-] Caught SIGTERM, stopping children
2019-06-18 18:59:45.416 92988 INFO oslo.service.wsgi [-] Stopping WSGI server.
2019-06-18 18:59:45.416 92988 INFO oslo_service.service [-] Waiting on 1 children to exit
2019-06-18 18:59:45.416 99204 INFO oslo.service.wsgi [-] Stopping WSGI server.
2019-06-18 18:59:45.431 92988 INFO oslo_service.service [-] Child 99204 exited with status 0
2019-06-18 18:59:45.432 92988 INFO oslo_service.service [-] Caught SIGTERM, stopping children
2019-06-18 18:59:45.433 92988 INFO oslo.service.wsgi [-] Stopping WSGI server.
2019-06-18 18:59:45.433 92988 INFO oslo_service.service [-] Waiting on 1 children to exit
2019-06-18 18:59:45.433 99229 INFO oslo.service.wsgi [-] Stopping WSGI server.
2019-06-18 18:59:45.437 92988 INFO oslo_service.service [-] Child 99229 exited with status 0
2019-06-18 18:59:45.538 92510 INFO sysinv.conductor.kube_app [-] Exiting progress monitoring thread for app stx-openstack
2019-06-18 18:59:45.539 92510 INFO sysinv.openstack.common.service [-] Caught SIGTERM, exiting
2019-06-18 19:00:44.954 13035 INFO sysinv.agent.manager [-] ilvg_get_nova_ilvg_by_ihost() Timeout.
2019-06-18 19:00:44.961 13035 INFO sysinv.openstack.common.rpc.common [-] Connected to AMQP server on 192.168.204.2:5672

Test Activity
-------------
Other