with continuous object add/delete, restarting control node service is not up

Bug #1527551 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Won't Fix
Medium
Nischal Sheth
R3.0
Fix Committed
Medium
Nischal Sheth
Trunk
Fix Committed
Medium
Nischal Sheth

Bug Description

R2.20 Build 115 ubuntu 14.04 Juno multinode setup

I have 3-controller node environment where 30 vns are getting added, 30 vms are spawned on them and are deleted.
Another script creates a VN, a LIF, a VMI , and then deletes it

In parallel, another script restarts the 3 control nodes randomly.
It is seen that sometimes, the control node does not come up. It says End-Of-RIB..

Prakash would update the bug with his findings. gcore of control node on one of the nodes will be in http://10.204.216.50/Docs/bugs/#

root@nodec2:/var/log/contrail# contrail-status
== Contrail Control ==
supervisor-control: active
contrail-control initializing (IFMap Server End-Of-RIB not computed)
contrail-control-nodemgr active
contrail-dns active
contrail-named active

--------------------

root@nodec1:~# cat bug-recreate.sh
device_id="fbaa5ca1-981c-4bfb-9d14-f44f2fff90d0"
mac="00:25:90:c3:09:6d"

while :
do
   neutron net-create bug-vn
   neutron subnet-create bug-vn 100.1.1.0/24
   vn_id=`neutron net-show bug-vn | grep " id " | awk '{ print $4}'`
   python config-tor-intf.py "ge-0/0/0" $device_id "ge-0/0/0.0" 0 1 $vn_id "00:25:90:c3:09:6d"
   sleep 20
   python del-lifs-vmis.py
   neutron net-delete bug-vn
   sleep 20
done

root@nodec1:~# cat test1.sh
source /etc/contrail/openstackrc

image_id="e9693c2b-f6ba-4830-a856-3e5b0f31db22"

while :
do

    for i in {1..30};
    do
           neutron net-create bugvn$i
           neutron subnet-create bugvn$i 100.$i.$i.0/24
           vn_id=`neutron net-show bugvn$i | grep " id " | awk '{ print $4}'`
           nova boot --nic net-id=$vn_id --flavor 1 --image $image_id vm_$i
    done
    for i in {1..30};
    do
           nova delete vm_$i
        done
    for i in {1..30};
    do
        neutron net-delete bugvn$i
    done

done
root@nodec1:~#
-------------------
bash-4.2$ cat restart_control1.sh
cmd="service contrail-control restart"
SSHOPT="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null "
while :
do
   sshpass -p c0ntrail123 ssh $SSHOPT root@nodec1 $cmd
# sleep $[ ( $RANDOM % 200 ) + 1 ]s
   sshpass -p c0ntrail123 ssh $SSHOPT root@nodec2 $cmd
# sleep $[ ( $RANDOM % 200 ) + 1 ]s
   sshpass -p c0ntrail123 ssh $SSHOPT root@nodec3 $cmd
   sleep $[ ( $RANDOM % 200 ) + 1 ]s

done
bash-4.2$

Revision history for this message
Prakash Bailkeri (prakashmb) wrote :

The EOR is assumed when there is a inactivity for 10ms (Default eor timeout). Since in this case script is continuously modifying the system(POST/DELETE/PUT), the inactivity is never seen to announce EOR is calculated.

ifmap/client/ifmap_channel.cc: in IFMapChannel::ReadPollResponse(),

...
..
        if (!end_of_rib_computed()) {
            // When the daemon is coming up, as long as we are receiving data,
            // we have not received the entire db. Keep re-arming the EOR timer
            // as long as we are receiving data.
            StartEndOfRibTimer();
        }
...
...
Due to config script ifmap server is continuously sending PollRespose and hence EOR is not calculated.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/20025
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/20026
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/20025
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/20026
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/20025
Committed: http://github.org/Juniper/contrail-controller/commit/b4f5f3679f684a4ea82f86ffc71270abeaf4f020
Submitter: Zuul
Branch: master

commit b4f5f3679f684a4ea82f86ffc71270abeaf4f020
Author: Nischal Sheth <email address hidden>
Date: Sun May 8 20:46:18 2016 -0700

Refine stale cleanup and EOR heuristics

Restart stale cleanup and EOR timers only if we receive searchResult.
If we receive updateResult or deleteResult, artificially expire the
relevant timers right away.

This handles the corner case where continuous object add/delete can
keep the CN from becoming active.

Note that this fix does not reintroduce bug 1501425.

Change-Id: Ib2f790d323f28c8403f8bcf20ba18fa782a03967
Closes-Bug: 1527551

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/20026
Committed: http://github.org/Juniper/contrail-controller/commit/4c1f1de97a7f2fc9d03b3de24e586cbcba24fe23
Submitter: Zuul
Branch: R3.0

commit 4c1f1de97a7f2fc9d03b3de24e586cbcba24fe23
Author: Nischal Sheth <email address hidden>
Date: Sun May 8 20:46:18 2016 -0700

Refine stale cleanup and EOR heuristics

Restart stale cleanup and EOR timers only if we receive searchResult.
If we receive updateResult or deleteResult, artificially expire the
relevant timers right away.

This handles the corner case where continuous object add/delete can
keep the CN from becoming active.

Note that this fix does not reintroduce bug 1501425.

Change-Id: Ib2f790d323f28c8403f8bcf20ba18fa782a03967
Closes-Bug: 1527551

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/34177
Submitter: Pramodh D'Souza (<email address hidden>)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.