OSTF test "Check network connectivity from instance via floating IP" on ping 8.8.8.8 from instance

Bug #1259923 reported by Anastasia Palkina
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Medium
Anastasiia Naboikina

Bug Description

ISO #124
"release": "4.0",
"nailgun_sha": "8d80f823c38c2af6dc98173bcbe348d022960a3d",
"ostf_sha": "cf48dac2a6e7ad284fc93c529f3d1e4668504028",
"astute_sha": "ae026938f272f69afbe89c9900bf1c3df483557c",
"fuellib_sha": "687a554eb9b6ae4dcc114f34e9690e601b40610c"

1. Create new environment (CentOS, simple mode)
2. Choose Ceph for images
3. Add controller, compute, 3 ceph
4. Start deployment. It was successful
5. Start OSTF tests. Test "Check network connectivity from instance via floating IP" has failed on step 5 with error: Time limit exceeded while waiting for public connectivity checking from VM to finish. Please refer to OpenStack logs for more details.

6. Also Actual Duration more then Expected Duration (see screen)
7. Manually I created security group, instance. And I can ping 8.8.8.8 from instance

[root@node-5 ~]# nova floating-ip-create nova
+--------------+-----------+----------+------+
| Ip | Server Id | Fixed Ip | Pool |
+--------------+-----------+----------+------+
| 172.16.0.128 | | None | nova |
+--------------+-----------+----------+------+
[root@node-5 ~]# nova add-floating-ip 4842ce6c-e3d6-4c6f-8909-1fcd0218cbca 172.16.0.128
[root@node-5 ~]# ssh cirros@172.16.0.128
Warning: Permanently added '172.16.0.128' (RSA) to the list of known hosts.
cirros@172.16.0.128's password:
$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=45 time=66.187 ms
64 bytes from 8.8.8.8: seq=1 ttl=45 time=61.305 ms
64 bytes from 8.8.8.8: seq=2 ttl=45 time=61.271 ms
^C
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 61.271/62.921/66.187 ms

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Anastasia Palkina (apalkina) wrote :
description: updated
Changed in fuel:
importance: Undecided → High
Revision history for this message
Mike Scherbakov (mihgen) wrote :

Lowered importance to medium: it affects only one HealthCheck test, right?

Changed in fuel:
importance: High → Medium
Revision history for this message
Anastasiia Naboikina (anaboikina) wrote :

Yes, Mike. And so far I wasn`t able to reproduce this issue.

Revision history for this message
Anastasiia Naboikina (anaboikina) wrote :

So far everything works on bare-metal and on virtualbox. So, fix will be provided after there is an env where it`s reproduced.

Changed in fuel:
status: New → Incomplete
Changed in fuel:
assignee: Anastasiia Naboikina (anaboikina) → nobody
Revision history for this message
Aleksandr Didenko (adidenko) wrote :
Download full text (4.1 KiB)

I get the same problem sporadically on bare-metal.

ISO: {"build_id": "2014-01-08_01-17-41", "ostf_sha": "05b1bfc92fa40728966518992c94176030c25f35", "build_number": "21", "nailgun_sha": "ea5dc470d3b789ad8dc0be2c256b9c7ec6ac8500", "fuelmain_sha": "73fb5d8bba72cb032b7ed4d034827b75042d0c47", "astute_sha": "419e864c77681c355043c21e76b3f5a940bef9ea", "release": "4.1", "fuellib_sha": "f661314923ce584ad84c40c7a99e80a8feb4e713"}

This problem occurs on environments with "neutron" only (with both vlan segmentation and GRE tunnels).

"Check network connectivity from instance via floating IP" OSTF check takes 25-27 seconds to complete on "nova network" configuration. But on "neutron" it takes 85 seconds minimum (average time is 85-100 seconds). And sometimes it hits the retries*timeout limit for SSH connection to the instance via floating IP. In both cases (nova-network and neutron) floating IP assigned to instance is accessible (pingable) in ~25 seconds.

So to summarize:
- nova-network: OSTF "Check network connectivity from instance via floating IP" check takes ~30 seconds (floating IP is "pingable" in ~25 seconds)
- neutron: OSTF "Check network connectivity from instance via floating IP" check takes ~80 seconds minimum (floating IP is "pingable" in ~25 seconds)

So this time difference is not caused by network and/or floating IP configuration, it seems to be caused by delayed SSH service start. It looks like it's happening because of "/etc/rc3.d/S45-cirros-net-ds" start script from Cirros TestVM image. It takes ~45 seconds to complete on neutron environment:

# time /sbin/cirros-ds net
cirros-ds 'net' up at 160.02
checking http://169.254.169.254/2009-04-04/instance-id
successful after 1/20 tries: up 160.03. iid=i-00000009
failed to get http://169.254.169.254/2009-04-04/meta-data// <head>/openssh-key
warning: no ec2 metadata for public-keys
failed to get http://169.254.169.254/2009-04-04/meta-data/instance-id
warning: no ec2 metadata for instance-id
failed to get http://169.254.169.254/2009-04-04/meta-data// <head>/openssh-key
warning: no ec2 metadata for public-keys
failed to get http://169.254.169.254/2009-04-04/meta-data/hostname
warning: no ec2 metadata for hostname
found datasource (ec2, net)
real 0m 45.55s
user 0m 0.00s
sys 0m 0.00s

Same command on instance with nova-network environment takes only few seconds:

# time /sbin/cirros-ds net
cirros-ds 'net' up at 44.18
checking http://169.254.169.254/2009-04-04/instance-id
successful after 1/20 tries: up 44.18. iid=i-0000000a
found datasource (ec2, net)
real 0m 3.45s
user 0m 0.00s
sys 0m 0.00s

Example from /var/log/messages from the cirros instance with neutron, note the time difference between instance start and dropbear (SSH service) run:

Jan 9 04:23:05 cirros kern.info kernel: [ 1.168128] Refined TSC clocksource calibration: 3499.947 MHz.
Jan 9 04:23:15 cirros kern.debug kernel: [ 11.736099] eth0: no IPv6 routers present
Jan 9 04:23:50 cirros auth.notice su: + /dev/console root:cirros
Jan 9 04:23:50 cirros authpriv.info dropbear[289]: Running in background

The same part of /var/log/messages from the cirros instance with nova-network:

Jan 9 06:13:23 cirros kern...

Read more...

Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Also reproduced on ISO #30
"build_id": "2014-01-13_01-17-41",
"ostf_sha": "05b1bfc92fa40728966518992c94176030c25f35",
"build_number": "30",
"nailgun_sha": "b4af350fc72ebaa26310cdb2bde71a466b16d598",
"fuelmain_sha": "4b7628de62b5dc0301ba3553a1aa2437d233e95f",
"astute_sha": "ca787b5b0a3a418e6885b9fd2d795c9fd158ed0a",
"release": "4.1",
"fuellib_sha": "c8673bb9474ccb0a51fb9077910b009ff2d9034b"

From /var/log/messages from the cirros instance with neutron (VLAN)
Jan 13 03:11:05 cirros kern.debug kernel: [ 29.681432] eth0: no IPv6 routers present
Jan 13 03:11:46 cirros auth.notice su: + /dev/console root:cirros
Jan 13 03:12:27 cirros authpriv.info dropbear[495]: Running in background

Changed in fuel:
milestone: 4.0 → 4.1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-ostf (master)

Fix proposed to branch: master
Review: https://review.openstack.org/68891

Changed in fuel:
assignee: nobody → Anastasiia Naboikina (anaboikina)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-ostf (master)

Reviewed: https://review.openstack.org/68891
Committed: https://git.openstack.org/cgit/stackforge/fuel-ostf/commit/?id=338ddf840c229918d1df8c6597588b853d02de4c
Submitter: Jenkins
Branch: master

commit 338ddf840c229918d1df8c6597588b853d02de4c
Author: AnastasiiaNaboikina <email address hidden>
Date: Fri Jan 24 13:19:43 2014 +0200

    Add more retries for connectivity check

    Increased number of retries for instance connectivity
    checks in test_nova_create_instance_with_connectivity
    to prevent failures when running tests on Neutron

    Change-Id: I429080524cbd04dae2db40cdba7d0feec63e5205
    Closes-bug: 1259923

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Verified on ISO #102
"build_id": "2014-02-07_13-44-21",
"ostf_sha": "d15d6b5b952e455e3afff383413ffa6d89ee7981",
"build_number": "102",
"nailgun_sha": "2d6aa79b9ed01ba166a0db543e19cd6d1d844503",
"fuelmain_sha": "7b8a9343db8d3c9cc5ba249768d72ea6d15d1a11",
"astute_sha": "d002c3bf626cff96a1d4aec9eb92fc4d5f4542c4",
"release": "4.1",
"fuellib_sha": "c10d3896a6b1b760229cc8ebd93778a48b571ee6"

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.