I was today investigating logs from http://logs.openstack.org/97/631497/7/check/tripleo-ci-centos-7-scenario007-standalone/42068d9/logs/undercloud/var/log/containers/neutron/dhcp-agent.log.txt.gz#_2019-04-10_14_18_36_934 once again.
It looks that dnsmasq process for network cbc2d3df-fcae-42b3-9d9b-248526a1a2f1 was first started properly at 14:18:34.803: http://logs.openstack.org/97/631497/7/check/tripleo-ci-centos-7-scenario007-standalone/42068d9/logs/undercloud/var/log/containers/neutron/dhcp-agent.log.txt.gz#_2019-04-10_14_18_34_803
Than some "Trigger reload_allocations for port" was logged at 14:18:36.890: http://logs.openstack.org/97/631497/7/check/tripleo-ci-centos-7-scenario007-standalone/42068d9/logs/undercloud/var/log/containers/neutron/dhcp-agent.log.txt.gz#_2019-04-10_14_18_36_890
That leads to reload of dnsmasq process which is done by sending SIGHUP. It was like that because external_process.ProcessManager.enable() was called but process was active so it called reload_cfg() method. See https://github.com/openstack/neutron/blob/master/neutron/agent/linux/external_process.py#L80 It happend at 14:18:36.895: http://logs.openstack.org/97/631497/7/check/tripleo-ci-centos-7-scenario007-standalone/42068d9/logs/undercloud/var/log/containers/neutron/dhcp-agent.log.txt.gz#_2019-04-10_14_18_36_895
But than, for some reason full sync was triggered and there was quickly send SIGKILL to the same process. It was at 14:18:38.436 : http://logs.openstack.org/97/631497/7/check/tripleo-ci-centos-7-scenario007-standalone/42068d9/logs/undercloud/var/log/containers/neutron/dhcp-agent.log.txt.gz#_2019-04-10_14_18_38_436
and next attempt to start process at 14:18:38.883: http://logs.openstack.org/97/631497/7/check/tripleo-ci-centos-7-scenario007-standalone/42068d9/logs/undercloud/var/log/containers/neutron/dhcp-agent.log.txt.gz#_2019-04-10_14_18_38_883 And this one failed.
It looks for me very similar to bug https://bugs.launchpad.net/neutron/+bug/1811126 which was fixed recently by https://github.com/openstack/neutron/commit/157e09e6af758b7669fbe5a8cdb0b1969f04661a
I'm not sure exactly what version of Neutron TripleO is using in this kind of job but can You maybe check if it was run with this patch or without it still?
I was today investigating logs from http:// logs.openstack. org/97/ 631497/ 7/check/ tripleo- ci-centos- 7-scenario007- standalone/ 42068d9/ logs/undercloud /var/log/ containers/ neutron/ dhcp-agent. log.txt. gz#_2019- 04-10_14_ 18_36_934 once again.
It looks that dnsmasq process for network cbc2d3df- fcae-42b3- 9d9b-248526a1a2 f1 was first started properly at 14:18:34.803: http:// logs.openstack. org/97/ 631497/ 7/check/ tripleo- ci-centos- 7-scenario007- standalone/ 42068d9/ logs/undercloud /var/log/ containers/ neutron/ dhcp-agent. log.txt. gz#_2019- 04-10_14_ 18_34_803
Than some "Trigger reload_allocations for port" was logged at 14:18:36.890: http:// logs.openstack. org/97/ 631497/ 7/check/ tripleo- ci-centos- 7-scenario007- standalone/ 42068d9/ logs/undercloud /var/log/ containers/ neutron/ dhcp-agent. log.txt. gz#_2019- 04-10_14_ 18_36_890
That leads to reload of dnsmasq process which is done by sending SIGHUP. It was like that because external_ process. ProcessManager. enable( ) was called but process was active so it called reload_cfg() method. See https:/ /github. com/openstack/ neutron/ blob/master/ neutron/ agent/linux/ external_ process. py#L80 logs.openstack. org/97/ 631497/ 7/check/ tripleo- ci-centos- 7-scenario007- standalone/ 42068d9/ logs/undercloud /var/log/ containers/ neutron/ dhcp-agent. log.txt. gz#_2019- 04-10_14_ 18_36_895
It happend at 14:18:36.895: http://
But than, for some reason full sync was triggered and there was quickly send SIGKILL to the same process. It was at 14:18:38.436 : http:// logs.openstack. org/97/ 631497/ 7/check/ tripleo- ci-centos- 7-scenario007- standalone/ 42068d9/ logs/undercloud /var/log/ containers/ neutron/ dhcp-agent. log.txt. gz#_2019- 04-10_14_ 18_38_436
and next attempt to start process at 14:18:38.883: http:// logs.openstack. org/97/ 631497/ 7/check/ tripleo- ci-centos- 7-scenario007- standalone/ 42068d9/ logs/undercloud /var/log/ containers/ neutron/ dhcp-agent. log.txt. gz#_2019- 04-10_14_ 18_38_883
And this one failed.
It looks for me very similar to bug https:/ /bugs.launchpad .net/neutron/ +bug/1811126 which was fixed recently by https:/ /github. com/openstack/ neutron/ commit/ 157e09e6af758b7 669fbe5a8cdb0b1 969f04661a
I'm not sure exactly what version of Neutron TripleO is using in this kind of job but can You maybe check if it was run with this patch or without it still?