OpenStack Compute (nova)

Comment 3 for bug 1641523

Revision history for this message

Aidin Alihodzic (ssbljk) wrote on 2016-11-15:

I need to add some more details because there was some unusual setup at my side and maybe this bug report was wrong in the first place.

I have been following official guide for installation of Newton on CentOS that can be found in Documentation.
Along the way I wrote each step as Ansible script so I can easily adjust it to my infrastructure and reproduce it in the future.
The most important part of this setup was that I wanted that everything works with SELinux and Firewalld turned ON.

After many tests, I changed a setup bit by bit while testing as much things as I could.

I was intrigued why I got that many instances stuck in "Deleting" process so I investigated it a bit more today.

Thing occurred after I tested migration of instances between hosts (not live one).
To implement migration I pretty much followed these two:
https://www.sebastien-han.fr/blog/2015/01/06/openstack-configure-vm-migrate-nova-ssh/
https://twiki.cern.ch/twiki/bin/view/Sandbox/GettingStartedwithOpenStack

Since novas required to have ssh keys generated and exchanged between hosts. I wrote Ansible scripts to do that, and it turned out that they won't do it because of these:
/var/lib/nova was a home directory for nova user and it had a selinux context nova_var_lib_t so I found in audit.log that it won't allow nova user to log in because of that.
And I changed context to user_home_t while leaving other directories to their default one, which is nova_var_lib_t except .ssh that had to be ssh_home_t.

Everything worked until I rebooted controller host (in this setup I have controller host that runs compute too, and two other compute nodes and Storwize as backend for Cinder). So I suppose that along the various tests and try/fail/success scenarios that I have been trying, I turned SELinux temporarily off by "setenforce 0" which returned back after reboot.

Today I found in logs that nova-api complains about not being able to access /var/lib/nova/keys directory because of wrong context of /var/lib/nova so I returned it back from user_home_t to nova_var_lib_t and set Ansible scripts to do the same, to generate and exchange ssh keys and to return context of nova's home and I don't get those stuck instances, so I suppose that it was the reason why I got those crashes of nova client when I tried to delete some of stuck instances.

I need to add some more details because there was some unusual setup at my side and maybe this bug report was wrong in the first place.

After many tests, I changed a setup bit by bit while testing as much things as I could.

I was intrigued why I got that many instances stuck in "Deleting" process so I investigated it a bit more today.