KeyError in neutron server when L2 agent requests info for devices

Bug #1372337 reported by shihanzhang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
shihanzhang

Bug Description

if L2 agent uses enhanced-security-group-rpc, in bellow case there will be a KeyError in neutron server:
1. Create security group with IPv6 ingress rule but no IPv4 ingress rule.
  (or delete IPv4 ingress rule from default security group)
2. Launch a VM on an IPv4 subnet, making it member of sec group created earlier

Instance will not get its network info. Neutron server starts reporting following errors and sends them to agent on each request for devices info:

2014-09-24 02:01:51.353 ERROR oslo.messaging.rpc.dispatcher [req-9b631b65-a753-4292-8442-98936a31db74 None None] Exception during message handling: 'IPv4'
2014-09-24 02:01:51.353 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2014-09-24 02:01:51.353 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply
2014-09-24 02:01:51.353 TRACE oslo.messaging.rpc.dispatcher incoming.message))
2014-09-24 02:01:51.353 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch
2014-09-24 02:01:51.353 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2014-09-24 02:01:51.353 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch
2014-09-24 02:01:51.353 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args)
2014-09-24 02:01:51.353 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/neutron/neutron/api/rpc/handlers/securitygroups_rpc.py", line 75, in security_group_info_for_devices
2014-09-24 02:01:51.353 TRACE oslo.messaging.rpc.dispatcher return self.plugin.security_group_info_for_ports(context, ports)
2014-09-24 02:01:51.353 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/neutron/neutron/db/securitygroups_rpc_base.py", line 201, in security_group_info_for_ports
2014-09-24 02:01:51.353 TRACE oslo.messaging.rpc.dispatcher return self._get_security_group_member_ips(context, sg_info)
2014-09-24 02:01:51.353 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/neutron/neutron/db/securitygroups_rpc_base.py", line 209, in _get_security_group_member_ips
2014-09-24 02:01:51.353 TRACE oslo.messaging.rpc.dispatcher if ip not in sg_info['sg_member_ips'][sg_id][ethertype]:
2014-09-24 02:01:51.353 TRACE oslo.messaging.rpc.dispatcher KeyError: 'IPv4'
2014-09-24 02:01:51.353 TRACE oslo.messaging.rpc.dispatcher
2014-09-24 02:01:51.354 ERROR oslo.messaging._drivers.common [req-9b631b65-a753-4292-8442-98936a31db74 None None] Returning exception 'IPv4' to caller

Changed in neutron:
assignee: nobody → shihanzhang (shihanzhang)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/123095

Changed in neutron:
status: New → In Progress
Revision history for this message
Oleg Bondarev (obondarev) wrote : Re: Security group KeyError

I was unable to reproduce this on devstack. After creating a port with IPv4 address in default security group and then deleting default IPv6 ingress rule from default security group I don't see any traces in q-svc. Can you please add more details on how to reproduce?

Changed in neutron:
status: In Progress → Incomplete
Revision history for this message
shihanzhang (shihanzhang) wrote :

hi Oleg Bondarev, the port should be used by a VM, you can try it again:
1. create a port with 'IPv4' address in default security group
2. create a VM using this port
3. delete the default rule 'IPv6' ingress

I am so sorry for this bug, if you can reproduce, please help me to change 'Importance' to high, that can make other reviewer pay close attention to

Changed in neutron:
status: Incomplete → In Progress
Revision history for this message
Oleg Bondarev (obondarev) wrote :

Thanks for the update, however I'm stiil not able to reproduce...

Revision history for this message
shihanzhang (shihanzhang) wrote :

I'm sorry, the correct is:

1. create a port with 'IPv4' address in default security group
2. create a VM using this port
3. delete the default rule 'IPv4'(not IPv6) ingress

    def _get_security_group_member_ips(self, context, sg_info):
        ips = self._select_ips_for_remote_group(
            context, sg_info['sg_member_ips'].keys())
        for sg_id, member_ips in ips.items():
            for ip in member_ips:
                ethertype = 'IPv%d' % netaddr.IPAddress(ip).version
                if ip not in sg_info['sg_member_ips'][sg_id][ethertype]:
                    sg_info['sg_member_ips'][sg_id][ethertype].append(ip)

the '_select_ips_for_remote_group' will select both IPv4 and IPv6 address in given remote security group id, if there is a port with IPv4 address in remote security group but no IPv4 rule,
sg_info['sg_member_ips'][sg_id][ethertype] will happen KeyError

summary: - Security group KeyError
+ Security group happen errors when L2 agent use enhanced-security-group-
+ rpc
description: updated
description: updated
summary: - Security group happen errors when L2 agent use enhanced-security-group-
- rpc
+ KeyError in neutron server when L2 agent requests info for devices
Revision history for this message
Oleg Bondarev (obondarev) wrote :

Reproduced the bug and updated description

description: updated
Changed in neutron:
importance: Undecided → High
Revision history for this message
shihanzhang (shihanzhang) wrote :

hi hi Oleg Bondarev, it this case, ipset also has problem, do you think put these two bug together or put it into another bug?

Revision history for this message
Oleg Bondarev (obondarev) wrote :

If the root cause is the same and these are just two symptoms of it I don't think we need another bug.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

So the bug can be reproduced with even siplier workflow: just create new security group with IPv6 ingress rule (and no IPv4) and launch an instance. Going to raise to Critical

description: updated
description: updated
Changed in neutron:
importance: High → Critical
Kyle Mestery (mestery)
Changed in neutron:
milestone: none → juno-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/123095
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=55f6a8ac5d234f004ef06add87d16284e9f048d3
Submitter: Jenkins
Branch: master

commit 55f6a8ac5d234f004ef06add87d16284e9f048d3
Author: shihanzhang <email address hidden>
Date: Mon Sep 22 17:28:06 2014 +0800

    Fix KeyError when getting secgroup info for ports

    The patch fixes a regression introduced with secgroup rpc refactor by
    handling the case when security group contains rules for only IPv4 or
    IPv6.

    Change-Id: I02b174757bfc796a81cdb482c55ba7f9e954131d
    Closes-bug: #1372337

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
tags: added: enhanced-rpc
tags: added: sg-enhanced-rpc
removed: enhanced-rpc
Thierry Carrez (ttx)
Changed in neutron:
milestone: juno-rc1 → 2014.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (feature/lbaasv2)

Fix proposed to branch: feature/lbaasv2
Review: https://review.openstack.org/130864

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (feature/lbaasv2)
Download full text (72.6 KiB)

Reviewed: https://review.openstack.org/130864
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c089154a94e5872efc95eab33d3d0c9de8619fe4
Submitter: Jenkins
Branch: feature/lbaasv2

commit 62588957fbeccfb4f80eaa72bef2b86b6f08dcf8
Author: Kevin Benton <email address hidden>
Date: Wed Oct 22 13:04:03 2014 -0700

    Big Switch: Switch to TLSv1 in server manager

    Switch to TLSv1 for the connections to the backend
    controllers. The default SSLv3 is no longer considered
    secure.

    TLSv1 was chosen over .1 or .2 because the .1 and .2 weren't
    added until python 2.7.9 so TLSv1 is the only compatible option
    for py26.

    Closes-Bug: #1384487
    Change-Id: I68bd72fc4d90a102003d9ce48c47a4a6a3dd6e03

commit 17204e8f02fdad046dabdb8b31397289d72c877b
Author: OpenStack Proposal Bot <email address hidden>
Date: Wed Oct 22 06:20:15 2014 +0000

    Imported Translations from Transifex

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I58db0476c810aa901463b07c42182eef0adb5114

commit d712663b99520e6d26269b0ca193527603178742
Author: Carl Baldwin <email address hidden>
Date: Mon Oct 20 21:48:42 2014 +0000

    Move disabling of metadata and ipv6_ra to _destroy_router_namespace

    I noticed that disable_ipv6_ra is called from the wrong place and that
    in some cases it was called with a bogus router_id because the code
    made an incorrect assumption about the context. In other case, it was
    never called because _destroy_router_namespace was being called
    directly. This patch moves the disabling of metadata and ipv6_ra in
    to _destroy_router_namespace to ensure they get called correctly and
    avoid duplication.

    Change-Id: Ia76a5ff4200df072b60481f2ee49286b78ece6c4
    Closes-Bug: #1383495

commit f82a5117f6f484a649eadff4b0e6be9a5a4d18bb
Author: OpenStack Proposal Bot <email address hidden>
Date: Tue Oct 21 12:11:19 2014 +0000

    Updated from global requirements

    Change-Id: Idcbd730f5c781d21ea75e7bfb15959c8f517980f

commit be6bd82d43fbcb8d1512d8eb5b7a106332364c31
Author: Angus Lees <email address hidden>
Date: Mon Aug 25 12:14:29 2014 +1000

    Remove duplicate import of constants module

    .. and enable corresponding pylint check now the only offending instance
    is fixed.

    Change-Id: I35a12ace46c872446b8c87d0aacce45e94d71bae

commit 9902400039018d77aa3034147cfb24ca4b2353f6
Author: rajeev <email address hidden>
Date: Mon Oct 13 16:25:36 2014 -0400

    Fix race condition on processing DVR floating IPs

    Fip namespace and agent gateway port can be shared by multiple dvr routers.
    This change uses a set as the control variable for these shared resources
    and ensures that Test and Set operation on the control variable are
    performed atomically so that race conditions do not occur among
    multiple threads processing floating IPs.
    Limitation: The scope of this change is limited to addressing the race
    condition described in the bug report. It may not address other issues
    such as pre-existing issue wit...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.