Traceback ERROR: VNI delete validation failed during VM deletion

Bug #1417825 reported by Danny Choi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-cisco
New
Undecided
Unassigned

Bug Description

The following traceback is logged in screen-q-svc.log during VM deletion:

Traceback (most recent call last):
  File "/opt/stack/neutron/neutron/plugins/ml2/managers.py", line 299, in _call_on_drivers
    getattr(driver.obj, method_name)(context)
  File "/opt/stack/neutron/neutron/openstack/common/lockutils.py", line 272, in inner
    return f(*args, **kwargs)
  File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/mech_cisco_nexus.py", line 416, in delete_port_postcommit
    self._delete_nve_member) if vxlan_segment else 0
  File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/mech_cisco_nexus.py", line 337, in _port_action_vxlan
    func(vni, device_id, mcast_group, host_id)
  File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/mech_cisco_nexus.py", line 160, in _delete_nve_member
    vni)
  File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/nexus_network_driver.py", line 341, in delete_nve_member
    self._edit_config(nexus_host, config=confstr)
  File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/nexus_network_driver.py", line 111, in _edit_config
    raise cexc.NexusConfigFailed(config=config, exc=e)
NexusConfigFailed: Failed to configure Nexus:
      <config xmlns:xc="urn:ietf:params:xml:ns:netconf:base:1.0">
        <configure>
          <__XML__MODE__exec_configure>
<interface>
    <nve>nve1</nve>
    <__XML__MODE_if-nve>
        <member>no member vni 9000</member>
    </__XML__MODE_if-nve>
</interface>

          </__XML__MODE__exec_configure>
        </configure>
      </config>
. Reason: ERROR: VNI delete validation failed
.
2015-02-03 15:51:35 ERROR thread=140215249078704 [neutron.plugins.ml2.plugin] mechanism_manager.delete_port_postcommit failed for port 4ad78d2a-c30f-4410-b8f8-d7c3ace379fd

Despite the error, the Nexus switch config is cleaned up correctly.

Steps to repro:
1. Pull in Rich's lockutil patch from staging/junoplus.
2. Using CLI, launch 100 VMs using the cirros image, each VM with 2 interfaces of 2 different networks.
3. Verify the N9Ks are configured correctly with the VXLAN parameters.
4. Check logs for error/traceback.
5. Note the errors in screen-n-cone.log:

screen-n-cond.log:2015-02-03 13:13:19.152 ERROR nova.scheduler.utils [req-6e3c07e8-2d6a-427b-8c27-858a7a821270 demo demo] [instance: a33d7012-3429-46f9-a160-3c863d2b74b2] Error from last host: qa5 (node qa5): [u'Traceback (most recent call last):\n', u' File "/opt/stack/nova/nova/compute/manager.py", line 2033, in _do_build_and_run_instance\n filter_properties)\n', u' File "/opt/stack/nova/nova/compute/manager.py", line 2164, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance a33d7012-3429-46f9-a160-3c863d2b74b2 was re-scheduled: Connection to neutron failed: HTTPConnectionPool(host='172.29.172.161', port=9696): Read timed out. (read timeout=30)\n"]
screen-n-cond.log:2015-02-03 13:13:20.050 ERROR nova.scheduler.utils [req-6e3c07e8-2d6a-427b-8c27-858a7a821270 demo demo] [instance: 9cd36f7f-13a5-41ea-84b2-3bc860ecb475] Error from last host: qa6 (node qa6): [u'Traceback (most recent call last):\n', u' File "/opt/stack/nova/nova/compute/manager.py", line 2033, in _do_build_and_run_instance\n filter_properties)\n', u' File "/opt/stack/nova/nova/compute/manager.py", line 2164, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance 9cd36f7f-13a5-41ea-84b2-3bc860ecb475 was re-scheduled: Connection to neutron failed: HTTPConnectionPool(host='172.29.172.161', port=9696): Read timed out. (read timeout=30)\n"]

6. Despite the errors, all 100 VMs come up active/running.
7. Using Horizon, delete 20 VMs at a time to delete all 100 VMs.
8. Verify the N9Ks config is removed correctly.
9. Check the traceback in screen-q-svc.log.
10. If no traceback, repeat steps 2 - 8.

Tags: nexus
Revision history for this message
Danny Choi (dannchoi) wrote :
Revision history for this message
Danny Choi (dannchoi) wrote :

I would see traceback in screen-n-cone.log during creation, but no traceback in screen-q-svc.log during deletion in the first trial. Then in subsequent trials, traceback in screen-n-cond.log during creation, and traceback in screen-q-svc,log during deletion.

Despite the traceback, all VMs are deleted and created successfully in subsequent trials.

Revision history for this message
Danny Choi (dannchoi) wrote :
Download full text (5.5 KiB)

A similar traceback is logged during VM deletion in the following scenario:

1. Uses Carol's latest patch for replay code, https://review.gerrithub.io/217434
2. Using CLI, launch 100 VMs using the cirros image, each VM with 2 interfaces of 2 different networks, that span across 3 Compute nodes.
3. Verify the two N9Ks are configured correctly with the VXLAN parameters.
4. Verify all 100 VMs are active and running.
5. Reboot the N9K that has the Network node and Compute-1 connected to it.
6. Upon boot up, verify the replay code reconfigures the switch correctly with the VXLAN parameters.
7. Using Horizon, delete 20 VMs at a time to delete all 100 VMs.
8. Verify the N9Ks config is removed correctly.
9. Check the traceback in screen-q-svc.log.
10. Note the following traceback.

2015-03-05 10:33:03.610 DEBUG neutron.plugins.ml2.drivers.cisco.nexus.nexus_network_driver [req-27f75e6b-c7eb-4e8e-8979-804c2dc34d4d admin 05e1512cadd84617a8cb4bc1f3e64c80
] NexusDriver:
      <config xmlns:xc="urn:ietf:params:xml:ns:netconf:base:1.0">
        <configure>
          <__XML__MODE__exec_configure>
<interface>
    <nve>nve1</nve>
    <__XML__MODE_if-nve>
        <member>no member vni 222224</member>
    </__XML__MODE_if-nve>
</interface>

          </__XML__MODE__exec_configure>
        </configure>
      </config>
 from (pid=31927) delete_nve_member /opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/nexus_network_driver.py:380
2015-03-05 10:33:03.610 DEBUG neutron.plugins.ml2.drivers.cisco.nexus.nexus_network_driver [req-27f75e6b-c7eb-4e8e-8979-804c2dc34d4d admin 05e1512cadd84617a8cb4bc1f3e64c80
] NexusDriver config:
      <config xmlns:xc="urn:ietf:params:xml:ns:netconf:base:1.0">
        <configure>
          <__XML__MODE__exec_configure>
<interface>
    <nve>nve1</nve>
    <__XML__MODE_if-nve>
        <member>no member vni 222224</member>
    </__XML__MODE_if-nve>
</interface>

          </__XML__MODE__exec_configure>
        </configure>
      </config>
 from (pid=31927) _edit_config /opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/nexus_network_driver.py:99
2015-03-05 10:33:03.612 INFO neutron.wsgi [req-5f6c2b8e-f759-4e88-ba7f-b6ae361b303e admin 05e1512cadd84617a8cb4bc1f3e64c80] 172.29.172.163 - - [05/Mar/
2015 10:33:03] "DELETE /v2.0/ports/ab786270-ed85-4439-afe4-672c05555b82.json HTTP/1.1" 204 149 2.072496
2015-03-05 10:33:03.840 ERROR neutron.plugins.ml2.drivers.cisco.nexus.mech_cisco_nexus [req-27f75e6b-c7eb-4e8e-8979-804c2dc34d4d admin 05e1512cadd84617a8cb4bc1f3e64c80
] Nexus Driver cisco_nexus failed in (delete_nve_member||disable_vxlan_feature)
2015-03-05 10:33:03.840 TRACE neutron.plugins.ml2.drivers.cisco.nexus.mech_cisco_nexus Traceback (most recent call last):
2015-03-05 10:33:03.840 TRACE neutron.plugins.ml2.drivers.cisco.nexus.mech_cisco_nexus File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/mech_cisco_nexus.py",
 line 364, in _delete_nve_member
2015-03-05 10:33:03.840 TRACE neutron.plugins.ml2.drivers.cisco.nexus.mech_cisco_nexus const.NVE_INT_NUM, vni)
2015-03-05 10:33:03.840 TRACE neutron.plugins.ml2.drivers.cisco.nexus.mech_cisco_nexus File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco...

Read more...

Revision history for this message
Danny Choi (dannchoi) wrote :
Download full text (17.8 KiB)

This traceback still happens in Kilo.

Below is the log when deleting 2VMs with the same tenant network from a compute node:

2015-04-21 08:21:26.045 DEBUG networking_cisco.plugins.ml2.drivers.cisco.nexus.nexus_db_v2 [req-d5c56275-d893-403d-9aa6-a6f5f08dbda0 demo 142fa7f81fba4f729d2004a88b3ee364] get_nve_vni_switch_bindings() called from (pid=20662) get_nve_vni_switch_bindings /usr/local/lib/python2.7/dist-packages/networking_cisco/plugins/ml2/drivers/cisco/nexus/nexus_db_v2.py:195
2015-04-21 08:21:26.047 DEBUG networking_cisco.plugins.ml2.drivers.cisco.nexus.nexus_network_driver [req-d5c56275-d893-403d-9aa6-a6f5f08dbda0 demo 142fa7f81fba4f729d2004a88b3ee364] NexusDriver: from (pid=20662) delete_nve_member /usr/local/lib/python2.7/dist-packages/networking_cisco/plugins/ml2/drivers/cisco/nexus/nexus_network_driver.py:392
2015-04-21 08:21:26.047 DEBUG networking_cisco.plugins.ml2.drivers.cisco.nexus.nexus_network_driver [req-d5c56275-d893-403d-9aa6-a6f5f08dbda0 demo 142fa7f81fba4f729d2004a88b3ee364] NexusDriver edit config:
      <config xmlns:xc="urn:ietf:params:xml:ns:netconf:base:1.0">
        <configure>
          <__XML__MODE__exec_configure>
<interface>
    <nve>nve1</nve>
    <__XML__MODE_if-nve>
        <member>no member vni 12345</member>
    </__XML__MODE_if-nve>
</interface>

          </__XML__MODE__exec_configure>
        </configure>
      </config>
 from (pid=20662) _edit_config /usr/local/lib/python2.7/dist-packages/networking_cisco/plugins/ml2/drivers/cisco/nexus/nexus_network_driver.py:107
2015-04-21 08:21:26.079 DEBUG oslo_messaging._drivers.amqp [-] unpacked context: {u'read_only': False, u'domain': None, u'project_name': None, u'user_id': None, u'show_deleted': False, u'roles': [u'admin'], u'user_identity': u'- - - - -', u'project_domain': None, u'tenant_name': None, u'auth_token': None, u'resource_uuid': None, u'project_id': None, u'tenant_id': None, u'is_admin': True, u'user': None, u'request_id': u'req-77049958-52c2-4047-9f45-5ddd743020ca', u'tenant': None, u'user_domain': None, u'timestamp': u'2015-04-20 20:37:13.854969', u'read_deleted': u'no', u'user_name': None} from (pid=20662) unpack_context /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqp.py:203
2015-04-21 08:21:26.080 DEBUG neutron.plugins.ml2.rpc [req-77049958-52c2-4047-9f45-5ddd743020ca None None] Device 7505f47b-6d06-4589-96bf-b5eaff2ad5be no longer exists at agent ovs-agent-qa6 from (pid=20662) update_device_down /opt/stack/neutron/neutron/plugins/ml2/rpc.py:144
2015-04-21 08:21:26.083 DEBUG neutron.plugins.ml2.plugin [req-77049958-52c2-4047-9f45-5ddd743020ca None None] No Port match for: 7505f47b-6d06-4589-96bf-b5eaff2ad5be from (pid=20662) port_bound_to_host /opt/stack/neutron/neutron/plugins/ml2/plugin.py:1435
2015-04-21 08:21:26.083 DEBUG neutron.plugins.ml2.rpc [req-77049958-52c2-4047-9f45-5ddd743020ca None None] Device 7505f47b-6d06-4589-96bf-b5eaff2ad5be not bound to the agent host qa6 from (pid=20662) update_device_down /opt/stack/neutron/neutron/plugins/ml2/rpc.py:152
2015-04-21 08:21:26.084 DEBUG oslo_messaging._drivers.amqp [req-77049958-52c2-4047-9f45-5ddd743020ca None None] UNIQUE_ID is a5da0275b34446febd11acce6c377e4e. from ...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.