rebooting a centos compute node loses /var/lock/nova

Bug #1833066 reported by Jonathan Rosser
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Confirmed
Undecided
Unassigned

Bug Description

On centos 7 and later /var/lock is tmpfs symlinked to /run, so the nova lock directory is missing once a node is rebooted after deployment.

We had a user report this in #openstack-ansible for the stein release on centos7

10:43 AM <key-networks> G'day folks. Just wondering if you do any reboot tests in your CI testing? On Friday I deployed stein AIO on CentOS 7. It worked fine until I rebooted the host and restarted galera cluster - nova-compute was dead. The nova-compute log said Permission denied: '/var/lock/nova' - the directory did not exist, so I followed https://bugs.launchpad.net/openstack-ansible/+bug/1636604 and recreated it and restarted nova-compute succ

10:43 AM <key-networks> essfully. After another reboot, I checked that nova-compute was running before restarting galera cluster. After restarting galera, nova-compute died and /var/lock/nova disappeared again. I then scrapped CentOS 7, installed Ubuntu 18.04 and have had a lot more success since then.

10:43 AM <@openstack> Launchpad bug 1636604 in openstack-ansible "Nova fails to launch any instances after the compute host is rebooted" [Undecided,Fix released] - Assigned to Paulo Matias (paulo-matias)

Revision history for this message
Chris Smart (csmart) wrote :

Creating the following file should work around it for now:

cat << EOF | sudo tee /usr/lib/tmpfiles.d/nova.conf
D /var/lock/nova 2770 root nova
EOF

This tells systemd to create that tmpfile (everything is under /run now) on boot.

Revision history for this message
Chris Smart (csmart) wrote :

Hmmm... I might be leading you astray there... after I created that file I did get /var/lock/nova but looking into the code, tmpfiles already exists for this under /etc/tmpfiles.d/openstack-nova-compute.conf and is correctly configured.

I also noticed that /var/lock/nova-compute is created, which is also specified in that file.

For me, I noticed after a reboot that cinder failed (due to not waiting for /openstack) and libvirtd had also failed and because libvirtd failed so had nova-compute... I fixed those up at the same time, so I wonder if it was working all along but it was cleaned up when nova-compute was just cleaning up after itself.

I need to dig into this a little more...

Chris Smart (csmart)
Changed in openstack-ansible:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.