Comment 1 for bug 1754327

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I was checking logs from such failed test: http://logs.openstack.org/07/525607/12/check/neutron-tempest-plugin-scenario-linuxbridge/09f04f9/logs/testr_results.html.gz

Instance checked: c6c1cdcf-c8ba-414d-a7e8-c7ac5e54e191

Instance fixed IP 10.1.0.4,
Floating IP: 172.24.5.14
Port's MAC: fa:16:3e:f7:57:4d

Timeline of events there is:
1. Instance spawned and paused by nova at Mar 06 19:52:59.027047: http://logs.openstack.org/07/525607/12/check/neutron-tempest-plugin-scenario-linuxbridge/09f04f9/logs/screen-n-cpu.txt.gz?level=INFO#_Mar_06_19_52_59_027047
2. DHCP agent send info that port is ready at Mar 06 19:52:42.035366 (but port is unbound then so it probably was created before instance)
3. Neutron sends notification to nova that port is active at Mar 06 19:52:45.006898: http://logs.openstack.org/07/525607/12/check/neutron-tempest-plugin-scenario-linuxbridge/09f04f9/logs/screen-q-svc.txt.gz#_Mar_06_19_52_45_006898
4. Instance started and paused by nova-compute at Mar 06 19:52:59.027047: http://logs.openstack.org/07/525607/12/check/neutron-tempest-plugin-scenario-linuxbridge/09f04f9/logs/screen-n-cpu.txt.gz?level=INFO#_Mar_06_19_52_59_027047
5. Instance resumed by nova-compute at Mar 06 19:53:02.159936: http://logs.openstack.org/07/525607/12/check/neutron-tempest-plugin-scenario-linuxbridge/09f04f9/logs/screen-n-cpu.txt.gz?level=INFO#_Mar_06_19_53_02_159936
6. Tempest starts trying to ssh to instance at 2018-03-06 19:53:08.151: http://logs.openstack.org/07/525607/12/check/neutron-tempest-plugin-scenario-linuxbridge/09f04f9/logs/tempest.txt.gz?level=INFO#_2018-03-06_19_53_08_151
7. First DHCPREQUEST logged in dhcp agent at Mar 06 19:57:45.087006: http://logs.openstack.org/07/525607/12/check/neutron-tempest-plugin-scenario-linuxbridge/09f04f9/logs/screen-q-dhcp.txt.gz#_Mar_06_19_57_45_087006
8. Request to delete instance (failed test) sent at: 2018-03-06 19:59:56.375330: http://logs.openstack.org/07/525607/12/check/neutron-tempest-plugin-scenario-linuxbridge/09f04f9/job-output.txt.gz#_2018-03-06_19_59_56_375330

So it looks like there is no problem with communication between services or with some "lag" during configuration of port by l2/dhcp agent.

I also checked directly on test node (thanks infra team) that after some time each of such instances are ready.
So I sent patch https://review.openstack.org/#/c/549324/ and it looks that tests are passing then but in many cases additional check_connectivity is called (so it would fail with timeout in normal case).