Was investigating one of the jobs that was going to time out:
http://logs.openstack.org/36/291136/2/check-tripleo/gate-tripleo-ci-f22-ceph/617cc22//console.html
The compute node had deployed, but was not picking up applying any SoftwareDeployments.
I could also not ssh into the node's IP, nor ping it: 192.0.2.9
Eventually the job timed out.
I saved the vm's disk for investigation aftewards.
It looks like nic5 was not mapped appropriately to eth4 by os-net-config, and then os-net-config tracebacked trying to add the non-existant interface "nic5" to the bridge:
Mar 10 15:03:50 overcloud-novacompute-0.localdomain os-collect-config[2280]: Traceback (most recent call last):
Mar 10 15:03:50 overcloud-novacompute-0.localdomain os-collect-config[2280]: File "/usr/bin/os-net-config", line 10, in <module>
Mar 10 15:03:50 overcloud-novacompute-0.localdomain os-collect-config[2280]: sys.exit(main())
Mar 10 15:03:50 overcloud-novacompute-0.localdomain os-collect-config[2280]: File "/usr/lib/python2.7/site-packages/os_net_config/cli.py", line 185, in main
Mar 10 15:03:50 overcloud-novacompute-0.localdomain os-collect-config[2280]: provider.add_object(obj)
Mar 10 15:03:50 overcloud-novacompute-0.localdomain os-collect-config[2280]: File "/usr/lib/python2.7/site-packages/os_net_config/__init__.py", line 52, in add_object
Mar 10 15:03:50 overcloud-novacompute-0.localdomain os-collect-config[2280]: self.add_bridge(obj)
Mar 10 15:03:50 overcloud-novacompute-0.localdomain os-collect-config[2280]: File "/usr/lib/python2.7/site-packages/os_net_config/impl_ifcfg.py", line 260, in add_bridge
Mar 10 15:03:50 overcloud-novacompute-0.localdomain os-collect-config[2280]: data = self._add_common(bridge)
Mar 10 15:03:50 overcloud-novacompute-0.localdomain os-collect-config[2280]: File "/usr/lib/python2.7/site-packages/os_net_config/impl_ifcfg.py", line 112, in _add_common
Mar 10 15:03:50 overcloud-novacompute-0.localdomain os-collect-config[2280]: mac = utils.interface_mac(base_opt.primary_interface_name)
Mar 10 15:03:50 overcloud-novacompute-0.localdomain os-collect-config[2280]: File "/usr/lib/python2.7/site-packages/os_net_config/utils.py", line 46, in interface_mac
Mar 10 15:03:50 overcloud-novacompute-0.localdomain os-collect-config[2280]: with open('/sys/class/net/%s/address' % name, 'r') as f:
Mar 10 15:03:50 overcloud-novacompute-0.localdomain os-collect-config[2280]: IOError: [Errno 2] No such file or directory: '/sys/class/net/nic5/address'
Mar 10 15:03:50 overcloud-novacompute-0.localdomain os-collect-config[2280]: + RETVAL=1
os-net-config then revert to the fallback mode triggered via os-refresh-config/configure.d/20-os-net-config, and that actually resulted in the node no longer being able to reach it's gateway, 192.0.2.1 at all.
You can see this in the os-collect-config where it fails to collect any metadata from this point on.
the full system journal from the failed compute node
it shows the os-net-config output from os-collect-config