SM Storage: Live migration provision fails as the auto-configuration of the NFS VM fails

Bug #1425718 reported by Jeya ganesh babu J
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Fix Committed
High
Dheeraj Gautam
R2.1
Fix Committed
High
Dheeraj Gautam

Bug Description

The Live migration fails as the auto configuration of the NFS VM does not work. The issue is caused by the user-data returning 0 bytes. This causes failure in the auto-provisioning.

Manual wget later worked, so a reboot of the VM fixed the issue.

Release note info -

The following error will be displayed on any of the compute node when the issue happens.

Feb 25 11:20:44 cmbu-ceph-perf3 puppet-agent[30259]: (/Stage[storage]/Contrail::Profile::Storage/Contrail::Storage/Contrail::Lib::Storage_common[storage-compute]/Exec[setup-config-storage-compute-live-migration]/returns) + mount 192.168.101.3:/livemnfsvol /var/lib/nova/instances/global
Feb 25 11:20:44 cmbu-ceph-perf3 puppet-agent[30259]: (/Stage[storage]/Contrail::Profile::Storage/Contrail::Storage/Contrail::Lib::Storage_common[storage-compute]/Exec[setup-config-storage-compute-live-migration]/returns) mount.nfs: access denied by server while mounting 192.168.101.3:/livemnfsvol

Workaround -
Restart the livemnfs VM '#nova stop livemnfs; nova start livemnfs'

description: updated
description: updated
information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.1

Review in progress for https://review.opencontrail.org/8869
Submitter: Dheeraj Gautam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/8877
Submitter: Dheeraj Gautam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/8877
Committed: http://github.org/Juniper/contrail-puppet/commit/42aa0dcd66d1501b88595175cde85c3441c0377b
Submitter: Zuul
Branch: master

commit 42aa0dcd66d1501b88595175cde85c3441c0377b
Author: root <email address hidden>
Date: Thu Apr 2 19:36:47 2015 -0700

SM-Storage: Fix multiple bug of sm-storage (mentioned below)

Closes-Bug: #1430476
ISSUE: On reboot of compute node, livemnfsvol was not mounted and
live-migration failed
ROOT-CAUSE: sm-storage was adding a entry into /etc/fstab for nfs.
previosuly puppet used to run all times, it will remount the livemnfsvol
After R2.1, puppet stopped running after provisioning, it was uncovered
FIX:
1. added entry to /etc/fstab
2. Changed live-migration script to make use of data network ip of
live-migration-nfs-host instead getting ip from host command
TESTING: After provision completion, verfied mount is available again.

Closes-Bug: #1439401
ISSUE: if live-migration is not configured but chassis is configured,
puppet agent reports of missing resource
FIX: Added a condition for adding dependency for chassis->live-migraion
TESTING: configured cluster without live-migration and with chassis, verfied
that no error are reported

Closes-Bug: #1436596
ISSUE: Sometimes storage installation is failed and reports "dpkg --configure -a"
should be run mannually to allow installation to go through
ROOT-CAUSE: There could be a case, where compute is reported as successfull,
but still one or more statements failed in compute role and disallow
reboot to run. This caused compute role to create
/etc/contrail/interface_created with contents "2" and give wrong
impression to storage that system has rebooted. Storeage starts
installation of packages, but in the middle of it, system reboots and
causes above issue.
FIX: moved created of /etc/contrail/interface_created from compute to storage
TETSING: reimaged/provisioned the cluster 5-6 times, no issue was reported

Closes-Bug: #1425718
Closes-Bug: #1439460
ISSUE: Certain times livemnfs instance may not come-up and report error. This
causes mount to fail on compute nodes.
Sometimes, even if livemnfs VM comes-up, mount may still fail, this
may happen if VM is not able fetch http://169.254.169.254/latest/user-data
and internal script waits there for rebooting by external force.
FIX: Added a check in live-migration script @openstack-role for mount, if mount
fails, stop the livemnfs instance and start again
TESTING: checked from openstack logs, mount code was hitting and instance was
stopped and started

Change-Id: Ieaa7f5015d18e59b8a5d31fa26d660ad8d6a9878

Changed in juniperopenstack:
status: New → Fix Committed
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/8869
Committed: http://github.org/Juniper/contrail-puppet/commit/af2aa5f9b9ed00eda11d14bfafc4349525ce7a4b
Submitter: Zuul
Branch: R2.1

commit af2aa5f9b9ed00eda11d14bfafc4349525ce7a4b
Author: root <email address hidden>
Date: Thu Apr 2 19:36:47 2015 -0700

SM-Storage: Fix multiple bug of sm-storage (mentioned below)

Closes-Bug: #1430476
ISSUE: On reboot of compute node, livemnfsvol was not mounted and
live-migration failed
ROOT-CAUSE: sm-storage was adding a entry into /etc/fstab for nfs.
previosuly puppet used to run all times, it will remount the livemnfsvol
After R2.1, puppet stopped running after provisioning, it was uncovered
FIX:
1. added entry to /etc/fstab
2. Changed live-migration script to make use of data network ip of
live-migration-nfs-host instead getting ip from host command
TESTING: After provision completion, verfied mount is available again.

Closes-Bug: #1439401
ISSUE: if live-migration is not configured but chassis is configured,
puppet agent reports of missing resource
FIX: Added a condition for adding dependency for chassis->live-migraion
TESTING: configured cluster without live-migration and with chassis, verfied
that no error are reported

Closes-Bug: #1436596
ISSUE: Sometimes storage installation is failed and reports "dpkg --configure -a"
should be run mannually to allow installation to go through
ROOT-CAUSE: There could be a case, where compute is reported as successfull,
but still one or more statements failed in compute role and disallow
reboot to run. This caused compute role to create
/etc/contrail/interface_created with contents "2" and give wrong
impression to storage that system has rebooted. Storeage starts
installation of packages, but in the middle of it, system reboots and
causes above issue.
FIX: moved created of /etc/contrail/interface_created from compute to storage
TETSING: reimaged/provisioned the cluster 5-6 times, no issue was reported

Closes-Bug: #1425718
Closes-Bug: #1439460
ISSUE: Certain times livemnfs instance may not come-up and report error. This
causes mount to fail on compute nodes.
Sometimes, even if livemnfs VM comes-up, mount may still fail, this
may happen if VM is not able fetch http://169.254.169.254/latest/user-data
and internal script waits there for rebooting by external force.
FIX: Added a check in live-migration script @openstack-role for mount, if mount
fails, stop the livemnfs instance and start again
TESTING: checked from openstack logs, mount code was hitting and instance was
stopped and started

Change-Id: Ieaa7f5015d18e59b8a5d31fa26d660ad8d6a9878

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.