I need to add some more details because there was some unusual setup at my side and maybe this bug report was wrong in the first place.
I have been following official guide for installation of Newton on CentOS that can be found in Documentation.
Along the way I wrote each step as Ansible script so I can easily adjust it to my infrastructure and reproduce it in the future.
The most important part of this setup was that I wanted that everything works with SELinux and Firewalld turned ON.
After many tests, I changed a setup bit by bit while testing as much things as I could.
I was intrigued why I got that many instances stuck in "Deleting" process so I investigated it a bit more today.
Since novas required to have ssh keys generated and exchanged between hosts. I wrote Ansible scripts to do that, and it turned out that they won't do it because of these:
/var/lib/nova was a home directory for nova user and it had a selinux context nova_var_lib_t so I found in audit.log that it won't allow nova user to log in because of that.
And I changed context to user_home_t while leaving other directories to their default one, which is nova_var_lib_t except .ssh that had to be ssh_home_t.
Everything worked until I rebooted controller host (in this setup I have controller host that runs compute too, and two other compute nodes and Storwize as backend for Cinder). So I suppose that along the various tests and try/fail/success scenarios that I have been trying, I turned SELinux temporarily off by "setenforce 0" which returned back after reboot.
Today I found in logs that nova-api complains about not being able to access /var/lib/nova/keys directory because of wrong context of /var/lib/nova so I returned it back from user_home_t to nova_var_lib_t and set Ansible scripts to do the same, to generate and exchange ssh keys and to return context of nova's home and I don't get those stuck instances, so I suppose that it was the reason why I got those crashes of nova client when I tried to delete some of stuck instances.
I need to add some more details because there was some unusual setup at my side and maybe this bug report was wrong in the first place.
I have been following official guide for installation of Newton on CentOS that can be found in Documentation.
Along the way I wrote each step as Ansible script so I can easily adjust it to my infrastructure and reproduce it in the future.
The most important part of this setup was that I wanted that everything works with SELinux and Firewalld turned ON.
After many tests, I changed a setup bit by bit while testing as much things as I could.
I was intrigued why I got that many instances stuck in "Deleting" process so I investigated it a bit more today.
Thing occurred after I tested migration of instances between hosts (not live one). /www.sebastien- han.fr/ blog/2015/ 01/06/openstack -configure- vm-migrate- nova-ssh/ /twiki. cern.ch/ twiki/bin/ view/Sandbox/ GettingStartedw ithOpenStack
To implement migration I pretty much followed these two:
https:/
https:/
Since novas required to have ssh keys generated and exchanged between hosts. I wrote Ansible scripts to do that, and it turned out that they won't do it because of these:
/var/lib/nova was a home directory for nova user and it had a selinux context nova_var_lib_t so I found in audit.log that it won't allow nova user to log in because of that.
And I changed context to user_home_t while leaving other directories to their default one, which is nova_var_lib_t except .ssh that had to be ssh_home_t.
Everything worked until I rebooted controller host (in this setup I have controller host that runs compute too, and two other compute nodes and Storwize as backend for Cinder). So I suppose that along the various tests and try/fail/success scenarios that I have been trying, I turned SELinux temporarily off by "setenforce 0" which returned back after reboot.
Today I found in logs that nova-api complains about not being able to access /var/lib/nova/keys directory because of wrong context of /var/lib/nova so I returned it back from user_home_t to nova_var_lib_t and set Ansible scripts to do the same, to generate and exchange ssh keys and to return context of nova's home and I don't get those stuck instances, so I suppose that it was the reason why I got those crashes of nova client when I tried to delete some of stuck instances.