dhcp cannot find tap device exceptions

Bug #1294254 reported by Darragh O'Reilly
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Darragh O'Reilly
Icehouse
Fix Released
Undecided
Unassigned

Bug Description

This happens in tempest tests a lot:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOiBcImNhbm5vdCBmaW5kIGRldmljZVwiIEFORCBmaWxlbmFtZTpcImxvZ3Mvc2NyZWVuLXEtZGhjcC50eHRcIiBBTkQgYnVpbGRfYnJhbmNoOlwibWFzdGVyXCIgQU5EIHByb2plY3Q6XCJvcGVuc3RhY2svbmV1dHJvblwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI0MzIwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTUwNTgyMTI3OTF9

The _set_default_route function is called when the driver is called with 'enable' or 'reload_allocations'. It calls the get_dhcp_port rpc which tries to get the previously created dhcp port for the network and host. But sometimes that port has been deleted by this time, and then get_dhcp_port actually creates a new one. This will have a different uuid than the original, so the tap name will be different and the call to list the routes on it gives an exception as that tap was never created.

It is not clear from the code why _set_default_route needs to do this.

Example from call_driver reload_allocations
http://logs.openstack.org/12/59212/12/check/check-tempest-dsvm-neutron-full/e3c77d0/logs/screen-q-dhcp.txt.gz#_2014-03-17_10_16_03_166

[req-024b81f2-eac3-490f-920d-b9fa1f70f51e None] Unable to reload_allocations dhcp for 9edab258-c056-40e3-a340-1c58622380bf.
Traceback (most recent call last):
  File "/opt/stack/new/neutron/neutron/agent/dhcp_agent.py", line 127, in call_driver
    getattr(driver, action)(**action_kwargs)
  File "/opt/stack/new/neutron/neutron/agent/linux/dhcp.py", line 400, in reload_allocations
    self.device_manager.update(self.network)
  File "/opt/stack/new/neutron/neutron/agent/linux/dhcp.py", line 810, in update
    self._set_default_route(network)
  File "/opt/stack/new/neutron/neutron/agent/linux/dhcp.py", line 669, in _set_default_route
    gateway = device.route.get_gateway()
  File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 388, in get_gateway
    *filters).split('\n')
  File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 211, in _run
    return self._parent._run(kwargs.get('options', []), self.COMMAND, args)
  File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 52, in _run
    return self._as_root(options, command, args)
  File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 70, in _as_root
    namespace)
  File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 81, in _execute
    root_helper=root_helper)
  File "/opt/stack/new/neutron/neutron/agent/linux/utils.py", line 76, in execute
    raise RuntimeError(m)
RuntimeError:
Command: ['sudo', '/usr/local/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qdhcp-9edab258-c056-40e3-a340-1c58622380bf', 'ip', 'route', 'list', 'dev', 'tapa01643ce-47']
Exit code: 1
Stdout: ''
Stderr: 'Cannot find device "tapa01643ce-47"\n'

Example from call_driver 'enable'
http://logs.openstack.org/12/59212/12/check/check-tempest-dsvm-neutron-full/e3c77d0/logs/screen-q-dhcp.txt.gz#_2014-03-17_10_16_03_166

[req-e5cdbe16-076d-4012-80af-7873ff14cdeb None] Unable to enable dhcp for 3236628d-4ccd-4c29-a2b6-ca4752725874.
Traceback (most recent call last):
  File "/opt/stack/new/neutron/neutron/agent/dhcp_agent.py", line 127, in call_driver
    getattr(driver, action)(**action_kwargs)
  File "/opt/stack/new/neutron/neutron/agent/linux/dhcp.py", line 166, in enable
    reuse_existing=True)
  File "/opt/stack/new/neutron/neutron/agent/linux/dhcp.py", line 803, in setup
    self._set_default_route(network)
  File "/opt/stack/new/neutron/neutron/agent/linux/dhcp.py", line 669, in _set_default_route
    gateway = device.route.get_gateway()
  File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 388, in get_gateway
    *filters).split('\n')
  File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 211, in _run
    return self._parent._run(kwargs.get('options', []), self.COMMAND, args)
  File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 52, in _run
    return self._as_root(options, command, args)
  File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 70, in _as_root
    namespace)
  File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 81, in _execute
    root_helper=root_helper)
  File "/opt/stack/new/neutron/neutron/agent/linux/utils.py", line 76, in execute
    raise RuntimeError(m)
RuntimeError:
Command: ['sudo', '/usr/local/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qdhcp-3236628d-4ccd-4c29-a2b6-ca4752725874', 'ip', 'route', 'list', 'dev', 'tap88655de5-5f']
Exit code: 1
Stdout: ''
Stderr: 'Cannot find device "tap88655de5-5f"\n'

tags: added: l3-ipam-dhcp
Revision history for this message
Darragh O'Reilly (darragh-oreilly) wrote :

Patch https://review.openstack.org/#/c/79282/ should fix this. It removes the RPC call to to get_dhcp_port. Instead the callers of _set_default_route() pass the interface name to it.

Kyle Mestery (mestery)
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/79282
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9dbd1e5e5a41eff88a044b8de992d2f1f14898b3
Submitter: Jenkins
Branch: master

commit 9dbd1e5e5a41eff88a044b8de992d2f1f14898b3
Author: Darragh O'Reilly <email address hidden>
Date: Sun Mar 9 15:14:03 2014 +0000

    Remove RPC to plugin when dhcp sets default route

    _set_default_route() was using an RPC to the plugin to get the DHCP
    port for the network on the current host, and then used it to form
    the tap device name. This happened on every allocation reload too.
    This fix removes the RPC and gets the tap device name using local
    methods instead. It also removes an unnecessary call to set the
    default route in the restart method.

    Closes-Bug: 1290068
    Related-Bug: 1294254
    Change-Id: I639bcf93725c4969d1011d2d20491d461ccfdbed

Revision history for this message
Kyle Mestery (mestery) wrote :

Darragh, can you confirm the fix which was merged on March 9 addresses this bug?

Changed in neutron:
assignee: nobody → Darragh O'Reilly (darragh-oreilly)
status: New → Fix Committed
Revision history for this message
Darragh O'Reilly (darragh-oreilly) wrote :

Kyle, yes. The fix was merged on 2014-04-22 and the logstash query above returns nothing after that.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/icehouse)

Related fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/98797

Thierry Carrez (ttx)
Changed in neutron:
milestone: none → juno-1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/icehouse)

Reviewed: https://review.openstack.org/98797
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9124db5e91e88d21ccd8ca4cffe55782b024ce55
Submitter: Jenkins
Branch: stable/icehouse

commit 9124db5e91e88d21ccd8ca4cffe55782b024ce55
Author: Darragh O'Reilly <email address hidden>
Date: Sun Mar 9 15:14:03 2014 +0000

    Remove RPC to plugin when dhcp sets default route

    _set_default_route() was using an RPC to the plugin to get the DHCP
    port for the network on the current host, and then used it to form
    the tap device name. This happened on every allocation reload too.
    This fix removes the RPC and gets the tap device name using local
    methods instead. It also removes an unnecessary call to set the
    default route in the restart method.

    Closes-Bug: 1290068
    Related-Bug: 1294254
    Change-Id: I639bcf93725c4969d1011d2d20491d461ccfdbed
    (cherry picked from commit 9dbd1e5e5a41eff88a044b8de992d2f1f14898b3)

tags: added: in-stable-icehouse
Thierry Carrez (ttx)
Changed in neutron:
milestone: juno-1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.