Tempest scenario jobs failing due to no FIP connectivity
Bug #1754327 reported by
Slawek Kaplonski
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Slawek Kaplonski |
Bug Description
It is quite often (especially for linuxbridge scenario job) that some tests (random) are failing because ssh to instance is not possible.
Example of such failed tests: http://
Same issue appears sometimes in dvr scenario job but it is not so often probably because it is multinode job and load on host is maybe lower so instances can boot faster.
tags: | added: neutron-proactive-backport-potential |
To post a comment you must log in.
I was checking logs from such failed test: http:// logs.openstack. org/07/ 525607/ 12/check/ neutron- tempest- plugin- scenario- linuxbridge/ 09f04f9/ logs/testr_ results. html.gz
Instance checked: c6c1cdcf- c8ba-414d- a7e8-c7ac5e54e1 91
Instance fixed IP 10.1.0.4,
Floating IP: 172.24.5.14
Port's MAC: fa:16:3e:f7:57:4d
Timeline of events there is: logs.openstack. org/07/ 525607/ 12/check/ neutron- tempest- plugin- scenario- linuxbridge/ 09f04f9/ logs/screen- n-cpu.txt. gz?level= INFO#_Mar_ 06_19_52_ 59_027047 logs.openstack. org/07/ 525607/ 12/check/ neutron- tempest- plugin- scenario- linuxbridge/ 09f04f9/ logs/screen- q-svc.txt. gz#_Mar_ 06_19_52_ 45_006898 logs.openstack. org/07/ 525607/ 12/check/ neutron- tempest- plugin- scenario- linuxbridge/ 09f04f9/ logs/screen- n-cpu.txt. gz?level= INFO#_Mar_ 06_19_52_ 59_027047 logs.openstack. org/07/ 525607/ 12/check/ neutron- tempest- plugin- scenario- linuxbridge/ 09f04f9/ logs/screen- n-cpu.txt. gz?level= INFO#_Mar_ 06_19_53_ 02_159936 logs.openstack. org/07/ 525607/ 12/check/ neutron- tempest- plugin- scenario- linuxbridge/ 09f04f9/ logs/tempest. txt.gz? level=INFO# _2018-03- 06_19_53_ 08_151 logs.openstack. org/07/ 525607/ 12/check/ neutron- tempest- plugin- scenario- linuxbridge/ 09f04f9/ logs/screen- q-dhcp. txt.gz# _Mar_06_ 19_57_45_ 087006 logs.openstack. org/07/ 525607/ 12/check/ neutron- tempest- plugin- scenario- linuxbridge/ 09f04f9/ job-output. txt.gz# _2018-03- 06_19_59_ 56_375330
1. Instance spawned and paused by nova at Mar 06 19:52:59.027047: http://
2. DHCP agent send info that port is ready at Mar 06 19:52:42.035366 (but port is unbound then so it probably was created before instance)
3. Neutron sends notification to nova that port is active at Mar 06 19:52:45.006898: http://
4. Instance started and paused by nova-compute at Mar 06 19:52:59.027047: http://
5. Instance resumed by nova-compute at Mar 06 19:53:02.159936: http://
6. Tempest starts trying to ssh to instance at 2018-03-06 19:53:08.151: http://
7. First DHCPREQUEST logged in dhcp agent at Mar 06 19:57:45.087006: http://
8. Request to delete instance (failed test) sent at: 2018-03-06 19:59:56.375330: http://
So it looks like there is no problem with communication between services or with some "lag" during configuration of port by l2/dhcp agent.
I also checked directly on test node (thanks infra team) that after some time each of such instances are ready. /review. openstack. org/#/c/ 549324/ and it looks that tests are passing then but in many cases additional check_connectivity is called (so it would fail with timeout in normal case).
So I sent patch https:/