VMs not pingable after host reboot

Bug #1791818 reported by Peng Peng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Won't Fix
Medium
Steven Webster

Bug Description

Brief Description
-----------------
Boot up VMs on a single node system, make sure VMs are pingable. After host reboot host, VMs are in Active state, but some of VMs are not pingable.

Severity
--------
Major

Steps to Reproduce
------------------
1. boot up 5VMs, ping 5 VMs
2. reboot -f host
3. check VMs in Active state
4. ping VMs

Expected Behavior
------------------
All pings success

Actual Behavior
----------------
some VM(s) failed to ping

Reproducibility
---------------
Intermittent (about 4/10 reproducing rate)

System Configuration
--------------------
One node system

Branch/Pull Time/Commit
-----------------------
master as of 2018-09-09_20-18-00

Timestamp/Logs
--------------
[2018-09-10 06:05:46,435] 264 DEBUG MainThread ssh.send :: Send 'nova --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://127.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-region-name RegionOne list --all-tenants'
[2018-09-10 06:05:50,108] 391 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+--------------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------------------+
| ID | Name | Tenant ID | Status | Task State | Power State | Networks |
+--------------------------------------+--------------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------------------+
| 7379e11d-a9b6-49f7-9b52-3324c8d7d28b | tenant2-image_ephemswap-6 | b4fec267b80a4375a92493443afbaa56 | ACTIVE | - | Running | tenant2-net1=172.18.1.135; tenant2-mgmt-net=192.168.251.9 |

[2018-09-10 06:05:52,424] 2628 INFO MainThread network_helper.ping_server:: All packets received by 192.168.251.9
[2018-09-10 06:05:52,424] 428 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2018-09-10 06:05:52,424] 264 DEBUG MainThread ssh.send :: Send 'ping -c 3 192.168.251.11'
[2018-09-10 06:05:55,534] 391 DEBUG MainThread ssh.expect :: Output:
PING 192.168.251.11 (192.168.251.11) 56(84) bytes of data.
From 192.168.51.3 icmp_seq=1 Destination Host Unreachable
From 192.168.51.3 icmp_seq=2 Destination Host Unreachable
From 192.168.51.3 icmp_seq=3 Destination Host Unreachable

--- 192.168.251.11 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2016ms
pipe 3

Peng Peng (ppeng)
description: updated
Revision history for this message
Frank Miller (sensfan22) wrote :

As this issue occurs at a fairly high rate (40%), marking this gating for the 2018.10 milestone.

Changed in starlingx:
assignee: nobody → Ghada Khalil (gkhalil)
status: New → Triaged
Ghada Khalil (gkhalil)
tags: added: stx.2018.10 stx.networking
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: Ghada Khalil (gkhalil) → nobody
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Steven Webster (swebster-wr)
Revision history for this message
Steven Webster (swebster-wr) wrote :

Looking at details logs from this case, it turns out that the first DHCP attempt fails, then subsequently succeeds 5 minutes later. The DHCP config in the VM is set for a 5 minute retry. It can be the case that a VM is launched and tries to DHCP before the DHCP server recovers. In this case, the VM eventually gets an address on retry.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

As reviewed with Steve, the delay in getting the ping working is expected under certain scenarios. The test-case needs to be more forgiving to cover these scenarios. There is no plan to attempt a software change for this issue. There is no way to coordinate the dhcp server being ready with nova bringing up the VM.

Marking as Won't Fix.
The recommendation is to update the test-case to attempt the ping for a longer period to cover the audit window (6-7 minutes)

Changed in starlingx:
importance: Undecided → Medium
status: Triaged → Won't Fix
Ken Young (kenyis)
tags: added: stx.1.0
removed: stx.2018.10
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.