bridge table of allowed-address-pair is corruption after live-migration

Bug #1541658 reported by Daisuke Nakajima
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Naveen N
R2.21.x
Fix Committed
High
Naveen N
R2.22.x
Fix Committed
High
Naveen N
Trunk
Fix Committed
High
Naveen N

Bug Description

Thare are three compute nodes and two virtual-machines which are running vrrp are on two compute nodes.
We move one virtual-machine by live migration which is Master VRRP to other compute node where Standby VRRP is running.
(Note, this issue is not seen if Master VRRP moves to same compute node where Standby VRRP is running.)
We also caputured at Virtual-machine, the Virtual-machine receaved two arp request to both instance-ip and VIP by vrouter-agnet, then the virtual-machine replied them.
As a result, inet table seems to be correct, but bridge table seems to be wrong.

Please see logs below, next hop of 10.0.1.254 is 31 which means local compute, but 0:0:5e:0:1:a is 24 which means remote compute.
It seems cruptin.
Compute-1(VRRP Master)---------
root@sv-6:~# rt --dump 1 | egrep "1.254/"
10.0.1.254/32 32 P - 31 0:0:5e:0:1:a(86308) <<<<<
root@sv-6:~# rt --dump 1 --family bridge
Kernel L2 Bridge table 0/1

Flags: L=Label Valid, Df=DHCP flood

Index DestMac Flags Label/VNID Nexthop
6444 52:54:0:6a:c0:92 Df - 3
21424 0:1:0:0:5:78 LDf 4 25
51272 2:51:b:a0:7c:b9 - 34
52932 2:97:39:39:bb:d0 LDf 19 21
86308 0:0:5e:0:1:a LDf 19 21 <<<<<
97192 ff:ff:ff:ff:ff:ff LDf 4 37
237456 2:d5:ac:a1:72:2 LDf 20 20
252916 0:0:5e:0:1:0 Df - 3

Routing table was fixed after vrrp Master was switched to another then back to origin virtual-machine.
Compute-1(VRRP Master)---------
root@sv-6:~# rt --dump 1 | grep "1.254/"
10.0.1.254/32 32 P - 31 0:0:5e:0:1:a(86308) <<<<<
root@sv-6:~# rt --dump 1 --family bridge
Kernel L2 Bridge table 0/1

Flags: L=Label Valid, Df=DHCP flood

Index DestMac Flags Label/VNID Nexthop
6444 52:54:0:6a:c0:92 Df - 3
21424 0:1:0:0:5:78 LDf 4 25
51272 2:51:b:a0:7c:b9 - 34
52932 2:97:39:39:bb:d0 LDf 19 21
86308 0:0:5e:0:1:a Df - 34 <<<<<
97192 ff:ff:ff:ff:ff:ff LDf 4 37
237456 2:d5:ac:a1:72:2 LDf 20 20
252916 0:0:5e:0:1:0 Df - 3

Compute-3(Client)---------
root@sv-8:~# rt --dump 1 | grep .1.254/
10.0.1.254/32 32 LP 18 19 0:0:5e:0:1:a(86308)
root@sv-8:~# rt --dump 1 --family bridge
Kernel L2 Bridge table 0/1

Flags: L=Label Valid, Df=DHCP flood

Index DestMac Flags Label/VNID Nexthop
21424 0:1:0:0:5:78 LDf 4 21
51272 2:51:b:a0:7c:b9 LDf 19 19
52932 2:97:39:39:bb:d0 LDf 19 23
86308 0:0:5e:0:1:a LDf 19 19
97192 ff:ff:ff:ff:ff:ff LDf 4 28
111580 52:54:0:9b:10:5e Df - 3
237456 2:d5:ac:a1:72:2 - 41
252916 0:0:5e:0:1:0 Df - 3

Tags: vrouter
information type: Proprietary → Public Security
information type: Public Security → Public
Changed in juniperopenstack:
assignee: nobody → Hari Prasad Killi (haripk)
tags: added: vrouter
Revision history for this message
chhandak (chhandak) wrote : Re: [Bug 1541658] Re: bridge table of allowed-address-pair is corruption after live-migration
Download full text (9.8 KiB)

Hi Daisuke-San,

We have tried the same scenario but not seeing the corruption. After migrating to new compute still VRRP MAC (Allowed Address Pair) point to local next hop. I have tried with Latest Mainline Build 2711 Kilo.

Please find observation below. Please let us know if we are trying something different.

Thanks and Regards,
Chhandak

VRRP master
--------------
root@vm-test-1:/home/ubuntu# ip a show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 02:1e:e9:48:18:82 brd ff:ff:ff:ff:ff:ff
    inet 1.1.1.3/24 brd 1.1.1.255 scope global eth0
    inet 1.1.1.50/32 scope global eth0>>>>>>>>>>>>>>>>>>>VIP
    inet6 fe80::1e:e9ff:fe48:1882/64 scope link
       valid_lft forever preferred_lft forever

Respective Compute
----------------------------------
root@cmbu-ceph-perf2:~# rt --dump 3 --family bridge>>>>>>>>>>>> cmbu-ceph-perf2: is current compute
Kernel L2 Bridge table 0/3

Flags: L=Label Valid, Df=DHCP flood

Index DestMac Flags Label/VNID Nexthop
2168 2:1e:e9:48:18:82 - 72
5084 ff:ff:ff:ff:ff:ff LDf 6 75
52076 0:0:5e:0:1:0 Df - 3
104012 2:6d:31:f9:dd:8e LDf 21 50
158240 0:0:5e:0:1:4 Df - 72 >>>>>>>>>>> VRRP MAC
178124 0:25:90:35:8a:1e Df - 3
root@cmbu-ceph-perf2:~# nh --get 72
Id:72 Type:Encap Fmly:AF_BRIDGE Rid:0 Ref_cnt:4 Vrf:3
              Flags:Valid, Policy,
              EncapFmly:0806 Oif:6 Len:14
              Encap Data: 02 1e e9 48 18 82 00 00 5e 00 01 00 08 00 >>>>>>>>>>>>>>>>>Pointing Local

root@vm-test-1:/home/ubuntu# tcpdump -ni eth0 not port 22
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
05:03:52.900295 IP6 fe80::1e:e9ff:fe48:1882.546 > ff02::1:2.547: dhcp6 request
05:03:52.900663 IP6 ::.547 > fe80::1e:e9ff:fe48:1882.546: dhcp6 reply
05:03:52.922331 IP6 fe80::1e:e9ff:fe48:1882 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
05:03:53.646801 IP 1.1.1.4 > 1.1.1.50: ICMP echo request, id 1944, seq 684, length 64
05:03:53.646825 IP 1.1.1.50 > 1.1.1.4: ICMP echo reply, id 1944, seq 684, length 64
05:03:54.647946 IP 1.1.1.4 > 1.1.1.50: ICMP echo request, id 1944, seq 685, length 64
05:03:54.647964 IP 1.1.1.50 > 1.1.1.4: ICMP echo reply, id 1944, seq 685, length 64>>>>> Responding to PING for VIP IP
05:03:54.702586 IP6 fe80::6d:31

Migrating VRRP Master VM to New Compute
-------------------------------------------------------------------
root@cmbu-ceph-perf1:~# nova live-migration 6b483f3b-daaf-44df-ba16-fe0fb7daf375 cmbu-ceph-perf3

root@cmbu-ceph-perf3:~# rt --dump 1...

Revision history for this message
Daisuke Nakajima (dnakajima) wrote :
Download full text (10.4 KiB)

Hi Chhandak,

We uses 2.21.1 build 22. Did you move to other Compute node where VRRP-Standby was running on Compute node?
We saw this issue that VRRP-Master had moved to other Compute node where VRRP-Standby had been running on Compute node.

Best regards,
Daisuke

-----Original Message-----
From: Chhandak Mukherjee
Sent: Wednesday, February 10, 2016 2:24 PM
To: Bug 1541658 <email address hidden>; Daisuke Nakajima <email address hidden>
Cc: Jeba Paulaiyan <email address hidden>
Subject: Re: [Bug 1541658] Re: bridge table of allowed-address-pair is corruption after live-migration

Hi Daisuke-San,

We have tried the same scenario but not seeing the corruption. After migrating to new compute still VRRP MAC (Allowed Address Pair) point to local next hop. I have tried with Latest Mainline Build 2711 Kilo.

Please find observation below. Please let us know if we are trying something different.

Thanks and Regards,
Chhandak

VRRP master
--------------
root@vm-test-1:/home/ubuntu# ip a show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 02:1e:e9:48:18:82 brd ff:ff:ff:ff:ff:ff
    inet 1.1.1.3/24 brd 1.1.1.255 scope global eth0
    inet 1.1.1.50/32 scope global eth0>>>>>>>>>>>>>>>>>>>VIP
    inet6 fe80::1e:e9ff:fe48:1882/64 scope link
       valid_lft forever preferred_lft forever

Respective Compute
----------------------------------
root@cmbu-ceph-perf2:~# rt --dump 3 --family bridge>>>>>>>>>>>> cmbu-ceph-perf2: is current compute Kernel L2 Bridge table 0/3

Flags: L=Label Valid, Df=DHCP flood

Index DestMac Flags Label/VNID Nexthop
2168 2:1e:e9:48:18:82 - 72
5084 ff:ff:ff:ff:ff:ff LDf 6 75
52076 0:0:5e:0:1:0 Df - 3
104012 2:6d:31:f9:dd:8e LDf 21 50
158240 0:0:5e:0:1:4 Df - 72 >>>>>>>>>>> VRRP MAC
178124 0:25:90:35:8a:1e Df - 3
root@cmbu-ceph-perf2:~# nh --get 72
Id:72 Type:Encap Fmly:AF_BRIDGE Rid:0 Ref_cnt:4 Vrf:3
              Flags:Valid, Policy,
              EncapFmly:0806 Oif:6 Len:14
              Encap Data: 02 1e e9 48 18 82 00 00 5e 00 01 00 08 00 >>>>>>>>>>>>>>>>>Pointing Local

root@vm-test-1:/home/ubuntu# tcpdump -ni eth0 not port 22
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
05:03:52.900295 IP6 fe80::1e:e9ff:fe48:1882.546 > ff02::1:2.547: dhcp6 request
05:03:52.900663 IP6 ::.547 > fe80::1e:e9ff:fe48:1882.546: dhcp6 reply
05:03:52.922331 IP6 fe80::1e:e9ff:fe48:1882 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
05:03:53.646801 IP 1.1.1.4 > 1.1.1.50: ICMP echo request, id ...

Revision history for this message
chhandak (chhandak) wrote :
Download full text (10.9 KiB)

Hi Daisuke-San,

As per the original bug description it is mentioned that issue not seen while moved to standby.
Can you please explain the sequence.

"Note, this issue is not seen if Master VRRP moves to same compute node where Standby VRRP is running"

Thanks and Regards,
Chhandak

________________________________________
From: Daisuke Nakajima
Sent: Wednesday, February 10, 2016 10:57 AM
To: Chhandak Mukherjee
Cc: Jeba Paulaiyan; Bug 1541658
Subject: RE: [Bug 1541658] Re: bridge table of allowed-address-pair is corruption after live-migration

Hi Chhandak,

We uses 2.21.1 build 22. Did you move to other Compute node where VRRP-Standby was running on Compute node?
We saw this issue that VRRP-Master had moved to other Compute node where VRRP-Standby had been running on Compute node.

Best regards,
Daisuke

-----Original Message-----
From: Chhandak Mukherjee
Sent: Wednesday, February 10, 2016 2:24 PM
To: Bug 1541658 <email address hidden>; Daisuke Nakajima <email address hidden>
Cc: Jeba Paulaiyan <email address hidden>
Subject: Re: [Bug 1541658] Re: bridge table of allowed-address-pair is corruption after live-migration

Hi Daisuke-San,

We have tried the same scenario but not seeing the corruption. After migrating to new compute still VRRP MAC (Allowed Address Pair) point to local next hop. I have tried with Latest Mainline Build 2711 Kilo.

Please find observation below. Please let us know if we are trying something different.

Thanks and Regards,
Chhandak

VRRP master
--------------
root@vm-test-1:/home/ubuntu# ip a show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 02:1e:e9:48:18:82 brd ff:ff:ff:ff:ff:ff
    inet 1.1.1.3/24 brd 1.1.1.255 scope global eth0
    inet 1.1.1.50/32 scope global eth0>>>>>>>>>>>>>>>>>>>VIP
    inet6 fe80::1e:e9ff:fe48:1882/64 scope link
       valid_lft forever preferred_lft forever

Respective Compute
----------------------------------
root@cmbu-ceph-perf2:~# rt --dump 3 --family bridge>>>>>>>>>>>> cmbu-ceph-perf2: is current compute Kernel L2 Bridge table 0/3

Flags: L=Label Valid, Df=DHCP flood

Index DestMac Flags Label/VNID Nexthop
2168 2:1e:e9:48:18:82 - 72
5084 ff:ff:ff:ff:ff:ff LDf 6 75
52076 0:0:5e:0:1:0 Df - 3
104012 2:6d:31:f9:dd:8e LDf 21 50
158240 0:0:5e:0:1:4 Df - 72 >>>>>>>>>>> VRRP MAC
178124 0:25:90:35:8a:1e Df - 3
root@cmbu-ceph-perf2:~# nh --get 72
Id:72 Type:Encap Fmly:AF_BRIDGE Rid:0 Ref_cnt:4 Vrf:3
              Flags:Valid, Policy,
              EncapFmly:0806 Oif:6 Len:14
              Encap Data: 02 1e e9 48 18 82 00 00 5e 00 01 00 08 00 >>>>>>>>>>>>>>>>>Pointing Loc...

Revision history for this message
Daisuke Nakajima (dnakajima) wrote :

Hi Please do procedure below on R2.21.1-22
1) Compute 1 VM1 VRRP Master, Compute 2 VM2 VRRP SLAVE and Compute 3
2) VM1 moved to compute 3

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/17272
Submitter: Naveen N (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22.x

Review in progress for https://review.opencontrail.org/17279
Submitter: Naveen N (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/17280
Submitter: Naveen N (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/17281
Submitter: Naveen N (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/17272
Committed: http://github.org/Juniper/contrail-controller/commit/2cbe98abec7b591924244267b3fb53fd8ba90ea8
Submitter: Zuul
Branch: R2.21.x

commit 2cbe98abec7b591924244267b3fb53fd8ba90ea8
Author: Naveen N <email address hidden>
Date: Tue Feb 16 14:46:23 2016 +0530

* Resync path preference value upon delete and readd of path
In case of L2 evpn route, upon ethernet tag, policy change or
preference change agent was deleting and readding local vm
path. In this scenario path preference values were lost and
bridge table was pointing to backup nexthop due to preference
mismatch.
Closes-bug:#1541658

Change-Id: I1888a74535b464e2ca7bdb33cea6d3ac97326206

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/17279
Committed: http://github.org/Juniper/contrail-controller/commit/ba818ef1aeef2c7bdf240935a9733b4fb17b06a2
Submitter: Zuul
Branch: R2.22.x

commit ba818ef1aeef2c7bdf240935a9733b4fb17b06a2
Author: Naveen N <email address hidden>
Date: Tue Feb 16 16:13:41 2016 +0530

* Resync path preference value upon delete and readd of path
In case of L2 evpn route, upon ethernet tag, policy change or
preference change agent was deleting and readding local vm
path. In this scenario path preference values were lost and
bridge table was pointing to backup nexthop due to preference
mismatch.
Closes-bug:#1541658

Change-Id: Ifad2f0970dde80f1e6b938de3c90dbf72c5321ab

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/17281
Committed: http://github.org/Juniper/contrail-controller/commit/2251953984db988c4a22c93d72224c7c2b3985d8
Submitter: Zuul
Branch: master

commit 2251953984db988c4a22c93d72224c7c2b3985d8
Author: Naveen N <email address hidden>
Date: Tue Feb 16 16:19:52 2016 +0530

* Resync path preference value upon delete and readd of path
In case of L2 evpn route, upon ethernet tag, policy change or
preference change agent was deleting and readding local vm
path. In this scenario path preference values were lost and
bridge table was pointing to backup nexthop due to preference
mismatch.
Closes-bug:#1541658

Change-Id: I298ff975372c509127e987bb41068dd799b56787

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/17280
Committed: http://github.org/Juniper/contrail-controller/commit/ce68a1b5a49f0e527ddc1b8386d2da1bf140ad5f
Submitter: Zuul
Branch: R2.20

commit ce68a1b5a49f0e527ddc1b8386d2da1bf140ad5f
Author: Naveen N <email address hidden>
Date: Tue Feb 16 16:16:25 2016 +0530

* Resync path preference value upon delete and readd of path
In case of L2 evpn route, upon ethernet tag, policy change or
preference change agent was deleting and readding local vm
path. In this scenario path preference values were lost and
bridge table was pointing to backup nexthop due to preference
mismatch.
Closes-bug:#1541658

Change-Id: I92f2b96eab7e3ef36546761a5d1dd6cdac77b230

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.