Livemigration provision fails with computes on different subnets

Bug #1388449 reported by Vinod Nair
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.0
Won't Fix
High
Jeya ganesh babu J
R2.1
Fix Committed
High
Jeya ganesh babu J
R2.20
Fix Committed
High
Jeya ganesh babu J
Trunk
Fix Committed
High
Jeya ganesh babu J

Bug Description

With Compute nodes in different subnets , live-migration provision fails as the livmnfs VM is not reachable via VGW

The hosts and their networks are as below

#OPTIONAL SEPARATION OF MANAGEMENT AND CONTROL + DATA
#====================================================
#Control Interface
control_data = {
    host1 : { 'ip': '13.1.0.1/24', 'gw' : '13.1.0.254', 'device':'eth2' },
    host2 : { 'ip': '13.1.0.2/24', 'gw' : '13.1.0.254', 'device':'eth2' },
    host3 : { 'ip': '13.1.0.3/24', 'gw' : '13.1.0.254', 'device':'eth2' },
    host4 : { 'ip': '13.1.0.4/24', 'gw' : '13.1.0.254', 'device':'eth2' },
    host5 : { 'ip': '13.1.0.5/24', 'gw' : '13.1.0.254', 'device':'eth2' },
    host7 : { 'ip': '14.1.0.7/24', 'gw' : '14.1.0.254', 'device':'eth3' },

}

#OPTIONAL STATIC ROUTE CONFIGURATION
#===================================
static_route = {

    host1 : [{ 'ip': '14.1.0.0', 'netmask' : '255.255.255.0', 'gw':'13.1.0.254', 'intf': 'eth2' }],
    host2 : [{ 'ip': '14.1.0.0', 'netmask' : '255.255.255.0', 'gw':'13.1.0.254', 'intf': 'eth2' }],
    host3 : [{ 'ip': '14.1.0.0', 'netmask' : '255.255.255.0', 'gw':'13.1.0.254', 'intf': 'eth2' }],
    host4 : [{ 'ip': '14.1.0.0', 'netmask' : '255.255.255.0', 'gw':'13.1.0.254', 'intf': 'eth2' }],
    host5 : [{ 'ip': '14.1.0.0', 'netmask' : '255.255.255.0', 'gw':'13.1.0.254', 'intf': 'eth2' }],
    host7 : [{ 'ip': '13.1.0.0', 'netmask' : '255.255.255.0', 'gw':'14.1.0.254', 'intf': 'eth3' }],
        }

cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp

auto eth3
iface eth3 inet manual
    pre-up ifconfig eth3 up
    post-down ifconfig eth3 down
    netmask 255.255.255.0
    mtu 9000

auto vhost0
iface vhost0 inet static
    pre-up /opt/contrail/bin/if-vhost0
    netmask 255.255.255.0
    network_name application
    address 14.1.0.7
    dns-search englab.juniper.net spglab.juniper.net juniper.net
    dns-nameservers 10.87.132.142 172.17.28.101

up route add 192.168.101.2 dev vhost0

Provision Logs
2014-10-28 13:46:26:830562: [root@10.87.141.1] out: [13.1.0.1] out: [root@14.1.0.7] out: --- 192.168.101.2 ping statistics ---
2014-10-28 13:46:26:830708: [root@10.87.141.1] out: [13.1.0.1] out: [root@14.1.0.7] out: 10 packets transmitted, 0 received, +10 errors, 100% packet loss, time 9048ms
2014-10-28 13:46:26:830840: [root@10.87.141.1] out: [13.1.0.1] out: [root@14.1.0.7] out: pipe 3
2014-10-28 13:46:26:830975: [root@10.87.141.1] out: [13.1.0.1] out: [root@14.1.0.7] out:
2014-10-28 13:46:26:831099: [root@10.87.141.1] out: [13.1.0.1] out:
2014-10-28 13:46:26:895683: [root@10.87.141.1] out: [13.1.0.1] out:
2014-10-28 13:46:26:895839: [root@10.87.141.1] out: [13.1.0.1] out: Fatal error: run() received nonzero return code 1 while executing!
2014-10-28 13:46:26:895971: [root@10.87.141.1] out: [13.1.0.1] out:
2014-10-28 13:46:26:896101: [root@10.87.141.1] out: [13.1.0.1] out: Requested: ping -c 10 192.168.101.2
2014-10-28 13:46:26:896227: [root@10.87.141.1] out: [13.1.0.1] out: Executed: /bin/bash -l -c "ping -c 10 192.168.101.2"
2014-10-28 13:46:26:896354: [root@10.87.141.1] out: [13.1.0.1] out:
2014-10-28 13:46:26:896479: [root@10.87.141.1] out: [13.1.0.1] out: Aborting.
2014-10-28 13:46:26:896602: [root@10.87.141.1] out: [13.1.0.1] out:
2014-10-28 13:46:26:896725: [root@10.87.141.1] out:
2014-10-28 13:46:26:913085: [root@10.87.141.1] out:
2014-10-28 13:46:26:913215: [root@10.87.141.1] out: Fatal error: run() received nonzero return code 1 while executing!
2014-10-28 13:46:26:913326: [root@10.87.141.1] out:
2014-10-28 13:46:26:913437: [root@10.87.141.1] out: Requested: python /opt/contrail/contrail_installer/contrail_setup_utils/livemnfs-ceph-setup.py --storage-master 13.1.0.1 --storage-setup-mode setup_global --storage-hostnames cs-scale-1 cs-scale-2 cs-scale-3 cs-scale-4 cs-scale-5 cs-scale-7 --storage-hosts 13.1.0.1 13.1.0.2 13.1.0.3 13.1.0.4 13.1.0.5 14.1.0.7 --storage-host-tokens n1keenA n1keenA n1keenA n1keenA n1keenA n1keenA --nfs-livem-subnet 192.168.101.0/24 --nfs-livem-image /store/livemnfs.qcow2 --nfs-livem-host cs-scale-4
2014-10-28 13:46:26:913556: [root@10.87.141.1] out: Executed: /bin/bash -l -c "python /opt/contrail/contrail_installer/contrail_setup_utils/livemnfs-ceph-setup.py --storage-master 13.1.0.1 --storage-setup-mode setup_global --storage-hostnames cs-scale-1 cs-scale-2 cs-scale-3 cs-scale-4 cs-scale-5 cs-scale-7 --storage-hosts 13.1.0.1 13.1.0.2 13.1.0.3 13.1.0.4 13.1.0.5 14.1.0.7 --storage-host-tokens n1keenA n1keenA n1keenA n1keenA n1keenA n1keenA --nfs-livem-subnet 192.168.101.0/24 --nfs-livem-image /store/livemnfs.qcow2 --nfs-livem-host cs-scale-4"

Tags: storage
Vinod Nair (vinodnair)
Changed in juniperopenstack:
milestone: none → r2.0-fcs
Changed in juniperopenstack:
milestone: r2.0-fcs → none
importance: Undecided → Critical
assignee: nobody → Jeya ganesh babu (jjeya)
summary: - Livemigration provision fails withcomputes on different subnets
+ Livemigration provision fails with computes on different subnets
Changed in juniperopenstack:
importance: Critical → High
information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/10249
Submitter: Jeya ganesh babu (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/10249
Committed: http://github.org/Juniper/contrail-provisioning/commit/0e7a9220eeb53d8149362d395145596f57e5e5f7
Submitter: Zuul
Branch: R2.20

commit 0e7a9220eeb53d8149362d395145596f57e5e5f7
Author: Jeya ganesh babu J <email address hidden>
Date: Tue May 12 12:25:38 2015 -0700

Provision fixes for heartbeat and replica

Closes-Bug: #1446391
Closes-Bug: #1447707
Closes-Bug: #1388449
Closes-Bug: #1446396
Closes-Bug: #1454898
Issues: OSDs flaps because of insufficient heartbeat timeout on
large clusters
Replica configured is overwritten when upgrade or setup_storage
is run again.
Live migration provision doesnt work if there are multiple
subnets
upgrade or setup storage creates new mons when the storage-compute
order changes in the testbed.py
if only ssd-disks is specified the pgs are stuck.
Fix: Configured heartbeat based on the replica size.
Added a configuration variable 'storage_replica_size' in testbed.py
to specify the replica
Addded fix to support multiple subnets for live migration.
The current monitors are not taken into account for the total
monitors. Fix added to take existing monitors into account.
If there is only 'ssd-disks', code added to treat as 'disks'.

Change-Id: I6a373416209756e14242ca437ede32db03d9d785

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11201
Submitter: Jeya ganesh babu (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11201
Committed: http://github.org/Juniper/contrail-provisioning/commit/cadb798e1cd4a38cf3c9f3088ce6da784fb0cda7
Submitter: Zuul
Branch: master

commit cadb798e1cd4a38cf3c9f3088ce6da784fb0cda7
Author: Jeya ganesh babu J <email address hidden>
Date: Tue Jun 2 14:32:43 2015 -0700

Storage provision fix merge

Closes-Bug: #1446391
Closes-Bug: #1447707
Closes-Bug: #1388449
Closes-Bug: #1446396
Closes-Bug: #1454898
Closes-Bug: #1457704
Closes-Bug: #1459835
Closes-Bug: #1460730
Closes-Bug: #1460730
Issues:
OSDs flaps because of insufficient heartbeat timeout on
large clusters
Replica configured is overwritten when upgrade or setup_storage
is run again.
Live migration provision doesnt work if there are multiple subnets
upgrade or setup storage creates new mons when the storage-compute
order changes in the testbed.py
if only ssd-disks is specified the pgs are stuck.
When an image added with http client, glance add fails.
If osd is not running, the remove disk fails as its trying
to stop the osd.
Fix:
Configured heartbeat based on the replica size.
Added a configuration variable 'storage_replica_size' in testbed.py
to specify the replica
Addded fix to support multiple subnets for live migration.
The current monitors are not taken into account for the total
monitors. Fix added to take existing monitors into account.
If there is only 'ssd-disks', code added to treat as 'disks'.
The known store configuration is set to use only rbd. This causes
even the glance client to use only rbd, blocking http access.
The quota for cinder is to be set based on the total space and
not the current available space.
Check added to stop osd only if osd is running.

Change-Id: I96a9a070eea1e0461c71566a3889a76f59828ef3

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.1

Review in progress for https://review.opencontrail.org/12113
Submitter: Jeya ganesh babu (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/12113
Committed: http://github.org/Juniper/contrail-provisioning/commit/da76a8e21b91029acb3f12c31e34384b04dc890f
Submitter: Zuul
Branch: R2.1

commit da76a8e21b91029acb3f12c31e34384b04dc890f
Author: Jeya ganesh babu J <email address hidden>
Date: Tue Jun 30 13:56:14 2015 -0700

Storage provision fix merge

Closes-Bug: #1446391
Closes-Bug: #1447707
Closes-Bug: #1388449
Closes-Bug: #1446396
Closes-Bug: #1454898
Closes-Bug: #1459835
Closes-Bug: #1460730
Issues:
OSDs flaps because of insufficient heartbeat timeout on
large clusters
Replica configured is overwritten when upgrade or setup_storage
is run again.
Live migration provision doesnt work if there are multiple subnets
upgrade or setup storage creates new mons when the storage-compute
order changes in the testbed.py
if only ssd-disks is specified the pgs are stuck.
If osd is not running, the remove disk fails as its trying
to stop the osd.
Fix:
Configured heartbeat based on the replica size.
Added a configuration variable 'storage_replica_size' in testbed.py
to specify the replica
Addded fix to support multiple subnets for live migration.
The current monitors are not taken into account for the total
monitors. Fix added to take existing monitors into account.
If there is only 'ssd-disks', code added to treat as 'disks'.
The quota for cinder is to be set based on the total space and
not the current available space.
Check added to stop osd only if osd is running.

Change-Id: Ic2555991dc2d1597b867117b7229a7218857b1b9

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.