'check_instance_connectivity' is called too early after creating an instance

Bug #1497572 reported by Dennis Dmitriev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Dmitry Tyzhnenko

Bug Description

Reproduced on CI: https://product-ci.infra.mirantis.net/job/7.0.system_test.ubuntu.ha_neutron_destructive_2/39/

In the system test 'neutron_l3_migration_after_reset' is performing the following scenario (it is not properly described in the docstring):

...
- Create an instance with a key pair
- Manually reschedule router from primary controller to another one
- Check network connectivity from instance via dhcp namespace
...

As the result, the test starts to check for connectivity from instance to 8.8.8.8 before the instance started it's network services.

In check_instance_connectivity() it is covered with 'wait' method:
wait(lambda: remote.execute(cmd)['exit_code'] == 0, timeout=2 * 60)

### From https://product-ci.infra.mirantis.net/job/7.0.system_test.ubuntu.ha_neutron_destructive_2/39/artifact/logs/sys_test.log :
http://paste.openstack.org/show/469587/

When ssh started on the instance, neutron l3 router had been already migrated, and the first ping from instance to 8.8.8.8 failed:

AssertionError: Instance has no connectivity, exit code 1,stdout ['PING 8.8.8.8 (8.8.8.8): 56 data bytes\n', '\n', '--- 8.8.8.8 ping statistics ---\n', '1 packets transmitted, 0 packets received, 100% packet loss\n'], stderr []

But the next pings started manually are working fine:

root@node-2:~# ip netns exec qrouter-1a851cee-c1a0-4c02-8d4d-62493f3cbd0a bash
root@node-2:~# ssh cirros@192.168.111.4
The authenticity of host '192.168.111.4 (192.168.111.4)' can't be established.
RSA key fingerprint is e7:6d:36:7e:61:bb:1a:ec:32:07:c7:4d:77:fc:c4:10.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.111.4' (RSA) to the list of known hosts.
cirros@192.168.111.4's password:
$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=55 time=4.754 ms
64 bytes from 8.8.8.8: seq=1 ttl=55 time=4.520 ms
64 bytes from 8.8.8.8: seq=2 ttl=55 time=5.508 ms
^C
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 4.520/4.927/5.508 ms

===================================

In the system test, we should perform more than just a single ping.

The test can be extended to not just ping something outside the cluster, but something like the following:
- establish a TCP/IP connection to some external resource;
- check that the connection alive (perform some request, do not close the connection)
- migrate the neutron l3 router;
- check that the connection alive (perform some request using the established connection)

TCP/IP connections should be alive after migrating, even if some re-tries will be performed by TCP stack in cases like described above.

Also this approach could be used instead of 'downloading a file' in appropriate tests to make sure that the connection remains alive.

Changed in fuel:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-qa (master)

Fix proposed to branch: master
Review: https://review.openstack.org/232561

Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Dmitry Tyzhnenko (dtyzhnenko)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/232561
Committed: https://git.openstack.org/cgit/stackforge/fuel-qa/commit/?id=b9127d0ea0f409ac39d21d45cf3378597254542e
Submitter: Jenkins
Branch: master

commit b9127d0ea0f409ac39d21d45cf3378597254542e
Author: Dmitry Tyzhnenko <email address hidden>
Date: Thu Oct 8 17:21:25 2015 +0300

    Wait ssh on instance before connect to it

    Add wait ssh into check_instance_connectivity method

    Change-Id: I18bbc7fa0ba7c201e8308f70c4fd8a6e3fdc9416
    Closes-bug: 1497572

Changed in fuel:
status: In Progress → Fix Committed
Dmitry Pyzhov (dpyzhov)
tags: added: area-qa
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.