Communication lost to N1kv vlan provider network after tenant network create/delete

Bug #1414060 reported by Richard Winters
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-cisco
Confirmed
Undecided
Abhishek Raut

Bug Description

OpenStack Version: Kilo

$ nova-manage version
2015.1

$ neutron --version
2.3.10

Communication is lost to the CSR for a short period of time after creating a network causing errors in the log when pings fail.

1. Why is communication lost for about 25 seconds after creating the network.
2. If not service impacting can we get rid of the error log

Steps to Repro:
  1. From the Controller node setup a continuous ping to the CSR management interface
  2. Create a network.
  3. Immediately attach the network to the router (CSR)
  4. Check the log for errors
  5. Check the ping stats

2015-01-21 10:48:10.029 DEBUG neutron.plugins.cisco.cfg_agent.service_helpers.routing_svc_helper [-] Routing service processing started from (pid=13719) process_service /opt/stack/neutron/neutron/plugins/cisco/cfg_agent/service_helpers/routing_svc_helper.py:162
2015-01-21 10:48:10.029 DEBUG neutron.plugins.cisco.cfg_agent.service_helpers.routing_svc_helper [-] Updated routers:[u'56f2cfbc-61c6-45dc-94d5-0cbb08b05053'] from (pid=13719) process_service /opt/stack/neutron/neutron/plugins/cisco/cfg_agent/service_helpers/routing_svc_helper.py:179
2015-01-21 10:48:10.248 DEBUG neutron.agent.linux.utils [-] Running command: ['ping', '-c', '5', '-W', '1', '-i', '0.2', '10.0.100.10'] from (pid=13719) create_process /opt/stack/neutron/neutron/agent/linux/utils.py:46
2015-01-21 10:48:12.096 ERROR neutron.agent.linux.utils [-]
Command: ['ping', '-c', '5', '-W', '1', '-i', '0.2', '10.0.100.10']
Exit code: 1
Stdout: 'PING 10.0.100.10 (10.0.100.10) 56(84) bytes of data.\n\n--- 10.0.100.10 ping statistics ---\n5 packets transmitted, 0 received, 100% packet loss, time 832ms\n\n'
Stderr: ''
2015-01-21 10:48:12.097 WARNING neutron.plugins.cisco.cfg_agent.device_status [-] Cannot ping ip address: 10.0.100.10

localadmin@qa1:~/devstack$ ping 10.0.100.10
PING 10.0.100.10 (10.0.100.10) 56(84) bytes of data.
64 bytes from 10.0.100.10: icmp_seq=

1 ttl=255 time=1.06 ms
64 bytes from 10.0.100.10: icmp_seq=2 ttl=255 time=0.927 ms
64 bytes from 10.0.100.10: icmp_seq=3 ttl=255 time=1.00 ms
64 bytes from 10.0.100.10: icmp_seq=4 ttl=255 time=0.906 ms
64 bytes from 10.0.100.10: icmp_seq=5 ttl=255 time=1.03 ms
64 bytes from 10.0.100.10: icmp_seq=6 ttl=255 time=1.04 ms
64 bytes from 10.0.100.10: icmp_seq=7 ttl=255 time=1.18 ms
64 bytes from 10.0.100.10: icmp_seq=8 ttl=255 time=1.08 ms
64 bytes from 10.0.100.10: icmp_seq=9 ttl=255 time=1.37 ms
64 bytes from 10.0.100.10: icmp_seq=10 ttl=255 time=1.19 ms
64 bytes from 10.0.100.10: icmp_seq=11 ttl=255 time=0.993 ms
64 bytes from 10.0.100.10: icmp_seq=12 ttl=255 time=1.06 ms
64 bytes from 10.0.100.10: icmp_seq=13 ttl=255 time=1.12 ms
64 bytes from 10.0.100.10: icmp_seq=14 ttl=255 time=0.989 ms
64 bytes from 10.0.100.10: icmp_seq=15 ttl=255 time=0.951 ms
64 bytes from 10.0.100.10: icmp_seq=16 ttl=255 time=0.749 ms
64 bytes from 10.0.100.10: icmp_seq=17 ttl=255 time=0.944 ms
64 bytes from 10.0.100.10: icmp_seq=18 ttl=255 time=1.01 ms
64 bytes from 10.0.100.10: icmp_seq=19 ttl=255 time=1.15 ms
64 bytes from 10.0.100.10: icmp_seq=20 ttl=255 time=1.04 ms
64 bytes from 10.0.100.10: icmp_seq=21 ttl=255 time=1.04 ms
64 bytes from 10.0.100.10: icmp_seq=22 ttl=255 time=1.05 ms
64 bytes from 10.0.100.10: icmp_seq=23 ttl=255 time=0.920 ms
64 bytes from 10.0.100.10: icmp_seq=24 ttl=255 time=0.862 ms
64 bytes from 10.0.100.10: icmp_seq=25 ttl=255 time=0.998 ms
64 bytes from 10.0.100.10: icmp_seq=26 ttl=255 time=0.932 ms
64 bytes from 10.0.100.10: icmp_seq=27 ttl=255 time=0.926 ms
64 bytes from 10.0.100.10: icmp_seq=28 ttl=255 time=1.16 ms
64 bytes from 10.0.100.10: icmp_seq=29 ttl=255 time=1.03 ms
64 bytes from 10.0.100.10: icmp_seq=30 ttl=255 time=3.64 ms

64 bytes from 10.0.100.10: icmp_seq=56 ttl=255 time=4.22 ms
64 bytes from 10.0.100.10: icmp_seq=57 ttl=255 time=1.03 ms
64 bytes from 10.0.100.10: icmp_seq=58 ttl=255 time=1.34 ms
64 bytes from 10.0.100.10: icmp_seq=59 ttl=255 time=0.797 ms
^C
--- 10.0.100.10 ping statistics ---
59 packets transmitted, 34 received, 42% packet loss, time 58085ms
rtt min/avg/max/mdev = 0.749/1.200/4.229/0.701 ms
localadmin@qa1:~/devstack$

Tags: csr n1kv
Henry Gessau (gessau)
tags: added: csr
removed: cisco
description: updated
Revision history for this message
Hareesh Puthalath (hareesh-puthalath) wrote :

I verified this behavior in my test bed. Communication is indeed lost to the CSR mgmt network when a new tenant network is created or deleted. Step 3 : Immediately attach the network to the router (CSR) is not needed.
Pings to the CSR mgmt interface stops at step 2: network create itself

This network (created) has no relation to the mgmt network. It is a tenant network.
Also this happens when the tenant network that was created earlier is deleted.

Changed in networking-cisco:
status: New → Confirmed
Revision history for this message
Hareesh Puthalath (hareesh-puthalath) wrote :

But the underlyinng problem is not the CSR not communicating to the network, but the packets meant for the CSR mgmt network are not reaching it at all, suggesting an issue with the L2 network implemented by the N1kv plugin.

====TCP dump of CSR mgmt interface (tap44d7844f-87) ======

13:55:30.709748 fa:16:3e:6d:53:b2 > fa:16:3e:4b:c7:07, ethertype IPv4 (0x0800), length 98: 10.0.100.2 > 10.0.100.10: ICMP echo request, id 15186, seq 11, length 64
13:55:30.710878 fa:16:3e:4b:c7:07 > fa:16:3e:6d:53:b2, ethertype IPv4 (0x0800), length 98: 10.0.100.10 > 10.0.100.2: ICMP echo reply, id 15186, seq 11, length 64
13:55:31.710324 fa:16:3e:6d:53:b2 > fa:16:3e:4b:c7:07, ethertype IPv4 (0x0800), length 98: 10.0.100.2 > 10.0.100.10: ICMP echo request, id 15186, seq 12, length 64
13:55:31.710915 fa:16:3e:4b:c7:07 > fa:16:3e:6d:53:b2, ethertype IPv4 (0x0800), length 98: 10.0.100.10 > 10.0.100.2: ICMP echo reply, id 15186, seq 12, length 64

<<No packets are reaching the CSR>>

13:55:41.011888 fa:16:3e:6d:53:b2 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 64: Request who-has 10.0.100.10 (ff:ff:ff:ff:ff:ff) tell 10.0.100.2, length 50
13:55:41.015453 fa:16:3e:4b:c7:07 > fa:16:3e:6d:53:b2, ethertype ARP (0x0806), length 60: Reply 10.0.100.10 is-at fa:16:3e:4b:c7:07, length 46
13:55:41.709741 fa:16:3e:6d:53:b2 > fa:16:3e:4b:c7:07, ethertype IPv4 (0x0800), length 98: 10.0.100.2 > 10.0.100.10: ICMP echo request, id 15186, seq 22, length 64
13:55:41.710437 fa:16:3e:4b:c7:07 > fa:16:3e:6d:53:b2, ethertype IPv4 (0x0800), length 98: 10.0.100.10 > 10.0.100.2: ICMP echo reply, id 15186, seq 22, length 64
13:55:42.709754 fa:16:3e:6d:53:b2 > fa:16:3e:4b:c7:07, ethertype IPv4 (0x0800), length 98: 10.0.100.2 > 10.0.100.10: ICMP echo request, id 15186, seq 23, length 64
13:

Revision history for this message
Hareesh Puthalath (hareesh-puthalath) wrote :

Further investigation found that connectivity is restored when an ARP packet is sent (See log above). Tried doing this manually and after there is an ARP sent for the CSR mgmt interface IP , connectivity is restored.

tags: added: n1kv
Revision history for this message
Hareesh Puthalath (hareesh-puthalath) wrote :
Download full text (3.7 KiB)

More details about the L2 subsystem

Plugin used: N1kv monolithic plugin

neutron --version
2.3.9.39

Pulled from neutron/master on Mon Oct 27 2014

VSM: n1000v-dk9.5.2.1.SK1.3.0.135.iso
VEM: nexus_1000v_vem-12.04-5.2.1.SK1.3.0.135.S0-0gdb.deb

Mgmt network is created using the following policy profile

$neutron cisco-network-profile-show osn_mgmt_np
+--------------------+--------------------------------------+
| Field | Value |
+--------------------+--------------------------------------+
| id | 674a3be6-8d63-4baa-a419-66dfc6a57324 |
| multicast_ip_range | |
| name | osn_mgmt_np |
| physical_network | osn_phy_network |
| segment_range | 100-100 |
| segment_type | vlan |
| sub_type | |
+--------------------+--------------------------------------+

$ neutron net-show osn_mgmt_nw
+---------------------------+--------------------------------------+
| Field | Value |
+---------------------------+--------------------------------------+
| admin_state_up | True |
| id | ed27e904-1c7b-41a7-8ca0-3191d5ef3cb6 |
| n1kv:member_segments | |
| n1kv:profile_id | 674a3be6-8d63-4baa-a419-66dfc6a57324 |
| name | osn_mgmt_nw |
| provider:network_type | vlan |
| provider:physical_network | osn_phy_network |
| provider:segmentation_id | 100 |
| router:external | False |
| shared | False |
| status | ACTIVE |
| subnets | a7cfab3c-e4a6-440b-9703-bfe75175d0ba |
| tenant_id | 67428a8854594f839476ed23ba9963e4 |
+---------------------------+--------------------------------------+

The mgmt network is exposed to the host via a veth pair with one interface on the host side
and the one on the bridge side (br-int)

Code that creates it: https://github.com/CiscoSystems/devstack/blob/csr1kv_for_routing_juno_minimal/lib/neutron_plugins/services/csr1kv_l3_setup/setup_l3cfgagent_networking.sh line 83

$ ovs
f7d8aaec-a240-4d10-8cc4-1f568bdebfc7
    Bridge br-int
        Controller "tcp:127.0.0.1"
            is_connected: true
        fail_mode: secure
        Port "tapbd6a7847-bf"
            Interface "tapbd6a7847-bf"
        .......
        Port "tap44d7844f-87" <========CSR mgmt interface
            Interface "tap44d7844f-87"
        Port "tapde43a743-06"
            Interface "tapde43a743-06"
        Port "tap6aff5ea7-a0"
            Interface "tap6aff5ea7-a0"
        Port "tapc3cfa984-e9"
            Interface "tapc3cfa984-e9"
        Port br-int
            Interface br-int
                type: internal
        Port "...

Read more...

Revision history for this message
Hareesh Puthalath (hareesh-puthalath) wrote :

Seems that the communication between the n1kv plugin and the VSM is a trigger for this issue.

In the n1kv_neutron_plugin.py
(https://github.com/openstack/neutron/blob/master/neutron/plugins/cisco/n1kv/n1kv_neutron_plugin.py)
      method : def create_network(self, context, network):

commented out self._send_create_network_request(context, net, segment_pairs) line 945 . This is the place where the n1kv plugin informs the VSM of the new network.

When this was removed (by the comment), and then a new network was created by the L2 API (neutron net-create), there was no longer an issue in reaching the CSR mgmt interface. Same with the network delete.

summary: - Communication lost to CSR after network create
+ Communication lost to N1kv vlan provider network after tenant network
+ create/delete
Abhishek Raut (abhraut)
Changed in networking-cisco:
assignee: nobody → Abhishek Raut (abhraut)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.