After Esx Maintainence mode recovery, when power-up of ~200 guest VM could observe Virtual-Network IP not assigned from Contrail DHCP

Bug #1518137 reported by Sarath
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Won't Fix
Critical
Unassigned
R2.20
Won't Fix
Critical
Unassigned
R2.21.x
Won't Fix
Critical
Unassigned
Trunk
Won't Fix
Critical
Unassigned

Bug Description

After Esx Maintainence mode recovery, when I power-up ~200 guest VM's of the host, I see Virtual-Network IP not assigned from Contrail DHCP
Some more observations relevant to this problem so that easy to narrow-down,
  1) After this When I try power-down / power-up single VM (or) reboot , it gets its IP address fine

Topology
##########

3 controllers
8 Esx nodes

### Debugging with reference to Vcenter-Plugin logs, I could observe IP assignment happened...

root@oblocknode03:/var/log/contrail#
root@oblocknode03:/var/log/contrail# cat contrail-vcenter-plugin.log | grep 200-31
2015-11-18 17:01:12,831 - INFO [pool-1-thread-1:VncDB@782] - VifPlug : VN:d48ca60e-7c67-355c-b416-86cddb391481, VM:VM-200-31 (5004f568-bf12-d00e-4c96-c2296df36293), vrouterIp:172.16.80.105, EsxiHost:10.87.26.200, vlan:1980/1981
2015-11-18 17:01:12,845 - INFO [pool-1-thread-1:VncDB@869] - VRouterAPi Add Port success - interface name: vmi-testvn1-11-VM-200-31(31c7ef74-b8c9-4180-9536-47ea4acaf9e2),, VM=VM-200-31, VN=testvn1-11, vmIpAddress=2.0.1.87, vlan=1980/1981
2015-11-18 17:01:12,846 - INFO [pool-1-thread-1:VncDB@891] - VifPlug for VM:VM-200-31 (5004f568-bf12-d00e-4c96-c2296df36293) Done
2015-11-18 18:32:28,525 - INFO [pool-1-thread-1:VncDB@782] - VifPlug : VN:d48ca60e-7c67-355c-b416-86cddb391481, VM:VM-200-31 (5004f568-bf12-d00e-4c96-c2296df36293), vrouterIp:172.16.80.105, EsxiHost:10.87.26.200, vlan:1980/1981
2015-11-18 18:32:28,539 - INFO [pool-1-thread-1:VncDB@869] - VRouterAPi Add Port success - interface name: vmi-testvn1-11-VM-200-31(31c7ef74-b8c9-4180-9536-47ea4acaf9e2),, VM=VM-200-31, VN=testvn1-11, vmIpAddress=2.0.1.87, vlan=1980/1981
2015-11-18 18:32:28,539 - INFO [pool-1-thread-1:VncDB@891] - VifPlug for VM:VM-200-31 (5004f568-bf12-d00e-4c96-c2296df36293) Done
 FullFormattedMessage: VM-200-31 on 10.87.26.200 in datacenter is powered off
2015-11-19 13:28:32,936 - INFO [pool-1-thread-1:VncDB@782] - VifPlug : VN:d48ca60e-7c67-355c-b416-86cddb391481, VM:VM-200-31 (5004f568-bf12-d00e-4c96-c2296df36293), vrouterIp:172.16.80.105, EsxiHost:10.87.26.200, vlan:1980/1981
2015-11-19 13:28:32,976 - ERROR [pool-1-thread-1:VncDB@879] - VRouterAPi Add Port failed - interface name: vmi-testvn1-11-VM-200-31(31c7ef74-b8c9-4180-9536-47ea4acaf9e2), VM=VM-200-31, VN=testvn1-11, vmIpAddress=2.0.1.87, vlan=1980/1981
2015-11-19 13:28:32,976 - INFO [pool-1-thread-1:VncDB@891] - VifPlug for VM:VM-200-31 (5004f568-bf12-d00e-4c96-c2296df36293) Done
2015-11-19 13:28:33,060 - INFO [Thread-4:VncDB@782] - VifPlug : VN:d48ca60e-7c67-355c-b416-86cddb391481, VM:VM-200-31 (5004f568-bf12-d00e-4c96-c2296df36293), vrouterIp:172.16.80.105, EsxiHost:10.87.26.200, vlan:1980/1981
2015-11-19 13:28:33,074 - ERROR [Thread-4:VncDB@879] - VRouterAPi Add Port failed - interface name: vmi-testvn1-11-VM-200-31(31c7ef74-b8c9-4180-9536-47ea4acaf9e2), VM=VM-200-31, VN=testvn1-11, vmIpAddress=2.0.1.87, vlan=1980/1981
2015-11-19 13:28:33,074 - INFO [Thread-4:VncDB@891] - VifPlug for VM:VM-200-31 (5004f568-bf12-d00e-4c96-c2296df36293) Done
 FullFormattedMessage: VM-200-31 on 10.87.26.200 in datacenter is powered on
2015-11-19 14:51:24,670 - INFO [pool-1-thread-1:VncDB@782] - VifPlug : VN:d48ca60e-7c67-355c-b416-86cddb391481, VM:VM-200-31 (5004f568-bf12-d00e-4c96-c2296df36293), vrouterIp:172.16.80.105, EsxiHost:10.87.26.200, vlan:1980/1981
2015-11-19 14:51:24,685 - INFO [pool-1-thread-1:VncDB@869] - VRouterAPi Add Port success - interface name: vmi-testvn1-11-VM-200-31(31c7ef74-b8c9-4180-9536-47ea4acaf9e2),, VM=VM-200-31, VN=testvn1-11, vmIpAddress=2.0.1.87, vlan=1980/1981
2015-11-19 14:51:24,685 - INFO [pool-1-thread-1:VncDB@891] - VifPlug for VM:VM-200-31 (5004f568-bf12-d00e-4c96-c2296df36293) Done

### Version

root@oblocknode02:/var#
root@oblocknode02:/var#
root@oblocknode02:/var# contrail-version
Package Version Build-ID | Repo | Package Name
-------------------------------------- ------------------------------ ----------------------------------
contrail-analytics 2.21-103 103
contrail-config 2.21-103 103
contrail-control 2.21-103 103
contrail-dns 2.21-103 103
contrail-f5 2.21-103 103
contrail-fabric-utils 2.21-103 103
contrail-install-packages 2.21-103~vcenter 103
contrail-install-vcenter-plugin 2.21-103 103
contrail-lib 2.21-103 103
contrail-nodemgr 2.21-103 103
contrail-openstack-analytics 2.21-103 103
contrail-openstack-control 2.21-103 103
contrail-openstack-database 2.21-103 103
contrail-openstack-webui 2.21-103 103
contrail-setup 2.21-103 103
contrail-utils 2.21-103 103
contrail-vmware-config 2.21-103 103
contrail-vmware-utils 2.21-103 103
contrail-web-controller 2.21-103 103
contrail-web-core 2.21-103 103
ifmap-python-client 0.1-2 103
ifmap-server 0.3.2-1contrail1 103
python-contrail 2.21-103 103
root@oblocknode02:/var#
root@oblocknode02:/var#

Sarath (nsarath)
summary: - Part of Esx Maintainence mode recovery, when power-up of ~200 guest VM
+ After Esx Maintainence mode recovery, when power-up of ~200 guest VM
could observe Virtual-Network IP not assigned from Contrail DHCP
Revision history for this message
Sarath (nsarath) wrote :

Please find below Controllers and Compute ( Relevant Host ) vrouter logs,

-bash-4.1$
-bash-4.1$ pwd
/users/nsarath/PR/PR-891
-bash-4.1$
-bash-4.1$ ls -l
total 2187080
-rwxrwxrwx 1 nsarath test 21278720 Nov 19 16:51 Compute-0-log.tar*
-rwxrwxrwx 1 nsarath test 926013440 Nov 19 16:32 Ctrl-A-log.tar*
-rwxrwxrwx 1 nsarath test 890501120 Nov 19 16:29 Ctrl-B-log.tar*
-rwxrwxrwx 1 nsarath test 392970240 Nov 19 16:29 Ctrl-C-log.tar*
-bash-4.1$

Sarath (nsarath)
Changed in juniperopenstack:
importance: Undecided → Critical
Revision history for this message
Sarath (nsarath) wrote :

Today Debugging this problem with Development (Sachin) and followup discussions below to be noted,
   1) Both Bug#1518137 and Bug#1519124, possibly narrow-down to same root-cause/theory and this is timing issue (inconsistent) depends on the sequence of DHCP request and VM powerON event and Plugin VM events processed and this could impact only for Guest-VM's which has less than 10 seconds max timeout for DHCP
   2) The possible contrail fix could help this if slightly lowering scan timer and Sachin prefer to put this only for 3.0 release to get more-tested and so for customer releases 2.21 and 2.22 we will release-note this bug for customer awareness

Release-Notes
#################
cirros-vm (or) any flavor of vm when having default of 9 seconds max timeout for DHCP from power-on, it may not get its IP address from Contrail DHCP during Power-on. The work-around is either to increase the DHCP max timeout configs (or) to request for new DHCP after power-on.

Revision history for this message
Sarath (nsarath) wrote :

Release-Notes:cirros-vm (or) any flavor of vm when having default of < 9 seconds max timeout for DHCP from power-on, it may not get its IP address from Contrail DHCP during Power-on.
The work-around is either to increase the DHCP max timeout configs (or) to request for new DHCP after power-on.

Sarath (nsarath)
tags: added: releasenote
Revision history for this message
Sarath (nsarath) wrote :

Please find modified Release-notes for correction,

Release-Notes: cirros-vm (or) any flavor of vm when having default of < 20 seconds max timeout for DHCP from power-on, it may not get its IP address from Contrail DHCP during Power-on.
The work-around is either to increase the DHCP max timeout configs (or) to request for new DHCP after power-on.

no longer affects: juniperopenstack/r3.0
Revision history for this message
Sarath (nsarath) wrote :

This issue also seen on R3.0. and pls release note also for R3.0 as customers also take Impact of this issue.

pls find logs @

-bash-4.1$
-bash-4.1$ pwd
/cs-shared/bugs/1518137
-bash-4.1$
-bash-4.1$ ls -l
total 2464524
-rwxrwxrwx 1 nsarath test 904734720 Feb 25 11:56 Ctrl-A-log.tar*
-rwxrwxrwx 1 nsarath test 745287680 Feb 25 11:56 Ctrl-B-log.tar*
-rwxrwxrwx 1 nsarath test 718868480 Feb 25 11:56 Ctrl-C-log.tar*
-rwxrwxrwx 1 nsarath test 19957760 Feb 25 11:56 Vrtr-0-log.tar*
-rwxrwxrwx 1 nsarath test 28477440 Feb 25 11:56 Vrtr-1-log.tar*
-rwxrwxrwx 1 nsarath test 19865600 Feb 25 11:56 Vrtr-3-log.tar*
-rwxrwxrwx 1 nsarath test 19046400 Feb 25 11:56 Vrtr-4-log.tar*
-rwxrwxrwx 1 nsarath test 19179520 Feb 25 11:56 Vrtr-5-log.tar*
-rwxrwxrwx 1 nsarath test 19210240 Feb 25 11:56 Vrtr-7-log.tar*
-rwxrwxrwx 1 nsarath test 19077120 Feb 25 11:56 Vrtr-8-log.tar*
-bash-4.1$

Andra Cismaru (andra)
Changed in juniperopenstack:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.