OOO:DPDK overcloud deploy fails as some dpdk computes cannot reach controller

Bug #1707022 reported by Vinod Nair
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.2
Incomplete
High
shajuvk
Trunk
New
High
Vinod Nair

Bug Description

overcloud deploy fails as some of the computes cannot reach controller

dpdk.ContrailDpdkAllNodesValidationDeployment.1:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: 88bb0d1b-b3b8-4ae0-9a34-e87b488c239a
  status: CREATE_FAILED
  status_reason: |
    Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
  deploy_stdout: |
    ...
    Ping to 10.0.0.20 failed. Retrying...
    Ping to 10.0.0.20 failed. Retrying...
    Ping to 10.0.0.20 failed. Retrying...
    Ping to 10.0.0.20 failed. Retrying...
    Ping to 10.0.0.20 failed. Retrying...
    Ping to 10.0.0.20 failed. Retrying...
    Ping to 10.0.0.20 failed. Retrying...
    Ping to 10.0.0.20 failed. Retrying...
    Ping to 10.0.0.20 failed. Retrying...
    FAILURE
    (truncated, view all with --long)
  deploy_stderr: |
    10.0.0.20 is not pingable. Local Network: 10.0.0.0/24

ip link show dev vlan141
7: vlan141@bond0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT qlen 1000
    link/ether 90:e2:ba:5a:89:ec brd ff:ff:ff:ff:ff:ff

The issue seem to beacause the PMD has not not started lacp
  Not managed by a supported kernel driver, skipped
2017-07-27 05:11:00,579 VROUTER: Found 2 eth device(s)
2017-07-27 05:11:00,579 VROUTER: Using 4 forwarding lcore(s)
2017-07-27 05:11:00,579 VROUTER: Using 0 IO lcore(s)
2017-07-27 05:11:00,579 VROUTER: Using 5 service lcores
2017-07-27 05:11:00,579 VROUTER: set fd limit to 4096 (prev 4096, max 4096)
2017-07-27 05:11:00,596 VROUTER: Adding VLAN forwarding interface bond0
2017-07-27 05:11:00,596 VROUTER: KNI is not available
2017-07-27 05:11:00,596 VROUTER: creating TAP device bond0
2017-07-27 05:11:00,605 VROUTER: error creating TAP interface bond0: Invalid argument (22)
2017-07-27 05:11:00,605 VROUTER: Error initializing device for VLAN forwarding interface: Invalid argument (22)
2017-07-27 05:11:00,606 VROUTER: Starting NetLink...
2017-07-27 05:11:00,606 VROUTER: Lcore 10: distributing MPLSoGRE packets to [11,12,13]
2017-07-27 05:11:00,606 USOCK: usock_alloc[2b4db40a3700]: new socket FD 44
2017-07-27 05:11:00,606 USOCK: usock_alloc[2b4db40a3700]: setting socket FD 44 send buff size.
Buffer size set to 18320000 (requested 9216000)
2017-07-27 05:11:00,606 VROUTER: NetLink TCP socket FD is 44
2017-07-27 05:11:00,606 VROUTER: uvhost Unix socket FD is 45
2017-07-27 05:11:00,607 UVHOST: Starting uvhost server...
2017-07-27 05:11:00,607 UVHOST: server event FD is 46
2017-07-27 05:11:00,607 UVHOST: server socket FD is 47
2017-07-27 05:11:00,607 VROUTER: Lcore 11: distributing MPLSoGRE packets to [10,12,13]
2017-07-27 05:11:00,607 VROUTER: Lcore 12: distributing MPLSoGRE packets to [10,11,13]
2017-07-27 05:11:00,607 VROUTER: Lcore 13: distributing MPLSoGRE packets to [10,11,12]
2017-07-27 05:11:01,606 VROUTER: Retrying connection for socket 45...
2017-07-27 05:11:01,607 UVHOST: Handling connection FD 47...
2017-07-27 05:11:01,607 UVHOST: FD 47 accepted new NetLink connection FD 48
2017-07-27 05:11:24,499 VROUTER: Adding vif 0 (gen. 1) eth device 0 (PMD) MAC 00:00:00:00:00:00 (vif MAC 90:e2:ba:5a:89:ec)
2017-07-27 05:11:24,499 VROUTER: bond eth device 0 configured MAC 90:e2:ba:5a:89:ec
2017-07-27 05:11:24,499 VROUTER: bond member eth device 1 PCI 0000:81:00.0 MAC 90:e2:ba:5a:89:ec
2017-07-27 05:11:24,499 VROUTER: bond member eth device 1 promisc mode disabled
2017-07-27 05:11:24,499 VROUTER: setup 4 RSS queue(s) and 0 filtering queue(s)
2017-07-27 05:11:24,665 PMD: ixgbe_dev_link_status_print(): Port 1: Link Down
2017-07-27 05:11:24,665 VROUTER: lcore 10 TX to HW queue 0
2017-07-27 05:11:24,665 VROUTER: lcore 11 TX to HW queue 1
2017-07-27 05:11:24,665 VROUTER: lcore 12 TX to HW queue 2
2017-07-27 05:11:24,665 VROUTER: lcore 13 TX to HW queue 3
2017-07-27 05:11:24,665 VROUTER: lcore 8 TX to HW queue 4
2017-07-27 05:11:24,665 VROUTER: lcore 9 TX to HW queue 5
2017-07-27 05:11:24,665 VROUTER: lcore 10 RX from HW queue 0
2017-07-27 05:11:24,665 VROUTER: lcore 11 RX from HW queue 1
2017-07-27 05:11:24,665 VROUTER: lcore 12 RX from HW queue 2
2017-07-27 05:11:24,665 VROUTER: lcore 13 RX from HW queue 3
2017-07-27 05:11:24,675 VROUTER: Adding vif 1 (gen. 2) device vhost0 at eth device 0 MAC 90:e2:ba:5a:89:ec (vif MAC 90:e2:ba:5a:89:ec)
2017-07-27 05:11:24,675 VROUTER: using bond slave eth device 1 MAC 90:e2:ba:5a:89:ec
2017-07-27 05:11:24,675 VROUTER: KNI is not available
2017-07-27 05:11:24,675 VROUTER: creating TAP device vhost0
2017-07-27 05:11:24,676 VROUTER: lcore 10 TX to HW queue 0
2017-07-27 05:11:24,676 VROUTER: lcore 11 TX to HW queue 1
2017-07-27 05:11:24,676 VROUTER: lcore 12 TX to HW queue 2
2017-07-27 05:11:24,676 VROUTER: lcore 13 TX to HW queue 3
2017-07-27 05:11:24,676 VROUTER: lcore 8 TX to HW queue 4
2017-07-27 05:11:24,676 VROUTER: lcore 9 TX to HW queue 5
2017-07-27 05:11:24,676 VROUTER: lcore 10 RX from HW queue 0
2017-07-27 05:11:24,706 VROUTER: Notification received for vhost0
2017-07-27 05:11:24,706 VROUTER: Configuring eth device 0 DOWN
2017-07-27 05:11:26,045 PMD: ixgbe_dev_link_status_print(): Port 1: Link Up - speed 10000 Mbps - full-duplex
2017-07-27 05:11:27,046 PMD: ixgbe_dev_link_status_print(): Port 1: Link Up - speed 10000 Mbps - full-duplex
2017-07-27 05:11:50,750 VROUTER: Notification received for vhost0
2017-07-27 05:11:50,750 VROUTER: Configuring eth device 0 UP
2017-07-27 05:11:50,970 PMD: ixgbe_dev_link_status_print(): Port 1: Link Down
2017-07-27 05:11:52,298 PMD: ixgbe_dev_link_status_print(): Port 1: Link Up -
 speed 10000 Mbps - full-duplex

opt/contrail/bin/dpdk_nic_bind.py -s

Network devices using DPDK-compatible driver
============================================
0000:81:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=uio_pci_generic unused=ixgbe

Network devices using kernel driver
===================================
0000:03:00.0 'I350 Gigabit Network Connection' if=eno1 drv=igb unused=uio_pci_generic *Active*
0000:03:00.1 'I350 Gigabit Network Connection' if=enp3s0f1 drv=igb unused=uio_pci_generic *Active*
0000:81:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=enp129s0f1 drv=ixgbe unused=uio_pci_generic

Other network devices
=====================
<none>

Vinod Nair (vinodnair)
tags: added: blocker
description: updated
Revision history for this message
Jeba Paulaiyan (jebap) wrote :

Workaround: Restart vrouter and re-deploy

tags: removed: blocker
Jeba Paulaiyan (jebap)
tags: added: releasenote
Revision history for this message
Jeba Paulaiyan (jebap) wrote :

Release-Notes:
==============
While deploying a DPDK compute in RHOSP10 environment, deployment might stop due to vrouter not able to reach the controller. Restarting the corresponding vrouter and redeploying will bring the cluster back up.

information type: Proprietary → Public
Revision history for this message
Kiran (kiran-kn80) wrote :

Vinod, Can you pl try to recreate this and give me the setup?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.