EMC-POC: Nexus VLAN plugin: With Neutron multiple workers enabled (> 8), intermittent failure in deleting VM

Bug #1475409 reported by Danny Choi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-cisco
New
Undecided
Unassigned

Bug Description

Neutron multiple workers are enabled as follows in neutron.conf:
   - api_workers=48
   - rpc_workers=48

Relay is aslo enabled.

The following are configured:
   - 20 tenants
   - Each tenant had 5 tenant networks
   - For each network, one VM at each Compute nodes (2) for a total of 10 VMs
   - Total 100 VLANs/200 VMs

A script which does the following at tenant-1:
   - Delete all 10 VMs in one single CLI command
   - For each network, launch 2 VMs (one at each Compute node)
   - Repeat steps 1 – 2

Intermittently the VM(s) fail to be deleted and result in ERROR state.
Note that the ml2 and nexus port database entires are not deleted.
User has to delete the VM(s) again to completely remove the VM(s).

In /var/log/messages, the following traceback is seen:

Jul 16 15:20:45 bxb-ds-46 neutron-server: Traceback (most recent call last):
Jul 16 15:20:45 bxb-ds-46 neutron-server: File "/usr/lib/python2.7/site-packages/eventlet/greenpool.py", line 82, in _spawn_n_impl
Jul 16 15:20:45 bxb-ds-46 neutron-server: func(*args, **kwargs)
Jul 16 15:20:45 bxb-ds-46 neutron-server: File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 661, in process_request
Jul 16 15:20:45 bxb-ds-46 neutron-server: proto.__init__(sock, address, self)
Jul 16 15:20:45 bxb-ds-46 neutron-server: File "/usr/lib64/python2.7/SocketServer.py", line 649, in __init__
Jul 16 15:20:45 bxb-ds-46 neutron-server: self.handle()
Jul 16 15:20:45 bxb-ds-46 neutron-server: File "/usr/lib64/python2.7/BaseHTTPServer.py", line 342, in handle
Jul 16 15:20:45 bxb-ds-46 neutron-server: self.handle_one_request()
Jul 16 15:20:45 bxb-ds-46 neutron-server: File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 267, in handle_one_request
Jul 16 15:20:45 bxb-ds-46 neutron-server: self.raw_requestline = self.rfile.readline(self.server.url_length_limit)
Jul 16 15:20:45 bxb-ds-46 neutron-server: File "/usr/lib64/python2.7/socket.py", line 476, in readline
Jul 16 15:20:45 bxb-ds-46 neutron-server: data = self._sock.recv(self._rbufsize)
Jul 16 15:20:45 bxb-ds-46 neutron-server: File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 309, in recv
Jul 16 15:20:45 bxb-ds-46 neutron-server: timeout_exc=socket.timeout("timed out"))
Jul 16 15:20:45 bxb-ds-46 neutron-server: File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 186, in _trampoline
Jul 16 15:20:45 bxb-ds-46 neutron-server: mark_as_closed=self._mark_as_closed)
Jul 16 15:20:45 bxb-ds-46 neutron-server: File "/usr/lib/python2.7/site-packages/eventlet/hubs/__init__.py", line 155, in trampoline
Jul 16 15:20:45 bxb-ds-46 neutron-server: listener = hub.add(hub.READ, fileno, current.switch, current.throw, mark_as_closed)
Jul 16 15:20:45 bxb-ds-46 neutron-server: File "/usr/lib/python2.7/site-packages/eventlet/hubs/epolls.py", line 49, in add
Jul 16 15:20:45 bxb-ds-46 neutron-server: listener = BaseHub.add(self, evtype, fileno, cb, tb, mac)
Jul 16 15:20:45 bxb-ds-46 neutron-server: File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 176, in add
Jul 16 15:20:45 bxb-ds-46 neutron-server: evtype, fileno, evtype, cb, bucket[fileno]))
Jul 16 15:20:45 bxb-ds-46 neutron-server: RuntimeError: Second simultaneous read on fileno 13 detected. Unless you really know what you're doing, make sure that only
 one greenthread can read any particular socket. Consider using a pools.Pool. If you do know what you're doing and want to disable this error, call eventlet.debug.hu
b_prevent_multiple_readers(False) - MY THREAD=<built-in method switch of greenlet.greenlet object at 0x4e35f50>; THAT THREAD=FdListener('read', 13, <built-in method s
witch of greenlet.greenlet object at 0x4e35cd0>, <built-in method throw of greenlet.greenlet object at 0x4e35cd0>)

Not sure if this is related to the failure because this traceback is also logged when the test is passing, e.g. delete one VM in one CLI command and put a 1 second delay between each delete.

Tags: nexus cisco e-rel
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.