Comment 1 for bug 1824802

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I was today investigating logs from http://logs.openstack.org/97/631497/7/check/tripleo-ci-centos-7-scenario007-standalone/42068d9/logs/undercloud/var/log/containers/neutron/dhcp-agent.log.txt.gz#_2019-04-10_14_18_36_934 once again.

It looks that dnsmasq process for network cbc2d3df-fcae-42b3-9d9b-248526a1a2f1 was first started properly at 14:18:34.803: http://logs.openstack.org/97/631497/7/check/tripleo-ci-centos-7-scenario007-standalone/42068d9/logs/undercloud/var/log/containers/neutron/dhcp-agent.log.txt.gz#_2019-04-10_14_18_34_803

Than some "Trigger reload_allocations for port" was logged at 14:18:36.890: http://logs.openstack.org/97/631497/7/check/tripleo-ci-centos-7-scenario007-standalone/42068d9/logs/undercloud/var/log/containers/neutron/dhcp-agent.log.txt.gz#_2019-04-10_14_18_36_890

That leads to reload of dnsmasq process which is done by sending SIGHUP. It was like that because external_process.ProcessManager.enable() was called but process was active so it called reload_cfg() method. See https://github.com/openstack/neutron/blob/master/neutron/agent/linux/external_process.py#L80
It happend at 14:18:36.895: http://logs.openstack.org/97/631497/7/check/tripleo-ci-centos-7-scenario007-standalone/42068d9/logs/undercloud/var/log/containers/neutron/dhcp-agent.log.txt.gz#_2019-04-10_14_18_36_895

But than, for some reason full sync was triggered and there was quickly send SIGKILL to the same process. It was at 14:18:38.436 : http://logs.openstack.org/97/631497/7/check/tripleo-ci-centos-7-scenario007-standalone/42068d9/logs/undercloud/var/log/containers/neutron/dhcp-agent.log.txt.gz#_2019-04-10_14_18_38_436

and next attempt to start process at 14:18:38.883: http://logs.openstack.org/97/631497/7/check/tripleo-ci-centos-7-scenario007-standalone/42068d9/logs/undercloud/var/log/containers/neutron/dhcp-agent.log.txt.gz#_2019-04-10_14_18_38_883
And this one failed.

It looks for me very similar to bug https://bugs.launchpad.net/neutron/+bug/1811126 which was fixed recently by https://github.com/openstack/neutron/commit/157e09e6af758b7669fbe5a8cdb0b1969f04661a

I'm not sure exactly what version of Neutron TripleO is using in this kind of job but can You maybe check if it was run with this patch or without it still?