[4.0.0.0-3038] Provisioning fails as 'fab setup_all' fails during prov_config()

Bug #1663464 reported by Pavana
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
Trunk
Fix Committed
High
Atul Moghe

Bug Description

Looks like connection is lost as 'Write failed: Broken pipe' is seen on issuing 'python provision_config_node.py'
Seen on all openstack versions on Ubuntu 14.04.

No env.dpdk section in testbed file, skipping the configuration...
2017-02-10 06:05:24:819235: [root@10.204.216.232] Executing task 'prov_config'
2017-02-10 06:05:24:819529: [root@10.204.216.232] sudo: hostname
2017-02-10 06:05:24:819950: [root@10.204.216.232] out: nodek12
2017-02-10 06:05:24:902723: [root@10.204.216.232] out:
2017-02-10 06:05:24:904254:
2017-02-10 06:05:24:919134: [root@10.204.216.232] sudo: python provision_config_node.py --api_server_ip 10.204.216.232 --host_name nodek12 --host_ip 10.204.216.232 --oper add --admin_user admin --admin_password contrail123 --admin_tenant_name admin
Write failed: Broken pipe
+ rv=255
+ return 255
+ debug_and_die 'fab setup_interface or setup_all task failed'
+ local 'message=fab setup_interface or setup_all task failed'

Link to the run - http://anamika.englab.juniper.net:8080/view/WebUISanityView/job/ubuntu-14-04_mitaka_Single_Node_UI_Sanity/131/console

Pavana (pavanap)
tags: added: blocker
information type: Proprietary → Public
Pavana (pavanap)
summary: - [4.0.0.0-3038] Provisioning fails as 'fab setup_interface' fails during
+ [4.0.0.0-3038] Provisioning fails as 'fab setup_all' fails during
prov_config()
Revision history for this message
Sachin Bansal (sbansal) wrote :

Is this reproducible? Could you please confirm it wasn't a network issue? I see this in the logs:

ssh: connect to host 10.204.216.232 port 22: Connection timed out

Revision history for this message
Pavana (pavanap) wrote :

Yes this is reproducible (I am seeing it each time while provisioning) and there wasn't any network issue. Seen on 3039 too -

2017-02-14 02:00:58:661673: [root@10.204.216.232] out: [localhost] local: chkconfig nova-compute on
2017-02-14 02:00:58:661934: [root@10.204.216.232] out: [localhost] local: service nova-compute restart
2017-02-14 02:00:58:662075: [root@10.204.216.232] out: nova-compute stop/waiting
2017-02-14 02:00:58:694211: [root@10.204.216.232] out: nova-compute start/running, process 27934
2017-02-14 02:00:58:726012: [root@10.204.216.232] out: [localhost] local: chkconfig supervisor-vrouter on
2017-02-14 02:00:58:726164: [root@10.204.216.232] out:
2017-02-14 02:00:58:757877:
2017-02-14 02:00:58:758489: No env.dpdk section in testbed file, skipping the configuration...
2017-02-14 02:00:58:758699: [root@10.204.216.232] Executing task 'prov_config'
2017-02-14 02:00:58:759079: [root@10.204.216.232] sudo: hostname
2017-02-14 02:00:58:759447: [root@10.204.216.232] out: nodek12
2017-02-14 02:00:58:809587: [root@10.204.216.232] out:
2017-02-14 02:00:58:809964: Write failed: Broken pipe
+ rv=255
+ return 255
+ debug_and_die 'fab setup_interface or setup_all task failed'
+ local 'message=fab setup_interface or setup_all task failed'

Revision history for this message
Ignatious Johnson Christopher (ijohnson-x) wrote :

Hi Nipa,

As I said earlier in the afternoon, vrouter kernel module is getting loaded in the kernel even before the reboot.
Which causes the vhost0 to be created with the same ip as the original physical interface.(vhost0 and <physicalInt> is holding the same ip).
Which results in two routes to same network one via vhost0 and other via <physicalInt>. So the compute nodes become unreachable.

We will need agent/vrouter teams help in debugging this and identify what causes the kernel module to be loaded.

Pavana/Nipa,

If you have the setup in failed state please share it with agent/vrouter team.

Thanks,
Ignatious

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/29046
Submitter: Atul Moghe (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/29046
Committed: http://github.org/Juniper/contrail-provisioning/commit/1a1deea8aa2a60dd5f0cce204bc894fbb16d1457
Submitter: Zuul (<email address hidden>)
Branch: master

commit 1a1deea8aa2a60dd5f0cce204bc894fbb16d1457
Author: Atul Moghe <email address hidden>
Date: Wed Feb 22 23:04:21 2017 +0000

removed vrouter restart from compute-server-setup,

this was causing to loose connectivity even before setup is completed
Closes-Bug: #1663464

Change-Id: Id2bc15205e5b37b7f609a13a1b1964680ab80777

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.