source-group security group rule not working with nova networking

Bug #1118608 reported by Andrea Frittoli
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Phil Day

Bug Description

Using a source-group type of rule with nova networking leads to VM not spawning (going into ERROR), when the source and destination group are the same.

Steps:
- Fresh devstack with default services (n-net and not quantum)
- nova secgroup-create nasty nasty
- nova secgroup-add-group-rule nasty nasty icmp -1 -1
- nova boot --image 68e97f3c-f160-48f9-9ce1-b6d8871c1f8b --flavor 1 --security-group nasty testvm0001

VM goes into error.
The issue is reproducible consistently.

Using Grizzly-2

ubuntu@devstack02:/opt/stack/nova$ git log -1 | head -2
commit 9ba00e5ab8eec49fddd9ca03fdd9c07c41b088b7
Merge: d2dad24 905b784

Revision history for this message
Andrea Frittoli (andrea-frittoli) wrote :
summary: - security group group rule not working in nova networking
+ source-group security group rule not working with nova networking
description: updated
Revision history for this message
Phil Day (philip-day) wrote :
Download full text (3.6 KiB)

Stack trace shows:

2013-02-08 13:58:05.300 ERROR nova.openstack.common.rpc.amqp [req-2e8fe45b-10db-49a9-8935-a6186d2377d4 None None] Exception during message handling
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/amqp.py", line 276, in _process_data
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp rval = self.proxy.dispatch(ctxt, version, method, **args)
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/dispatcher.py", line 133, in dispatch
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs)
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/exception.py", line 109, in wrapped
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp temp_level, payload)
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/exception.py", line 88, in wrapped
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw)
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 583, in refresh_instance_security_rules
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp return self.driver.refresh_instance_security_rules(instance)
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2437, in refresh_instance_security_rules
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp self.firewall_driver.refresh_instance_security_rules(instance)
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/virt/firewall.py", line 433, in refresh_instance_security_rules
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp self.do_refresh_instance_rules(instance)
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/virt/firewall.py", line 451, in do_refresh_instance_rules
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp ipv4_rules, ipv6_rules = self.instance_rules(instance, network_info)
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/virt/firewall.py", line 403, in instance_rules
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp instance)
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/network/api.py", line 88, in wrapped
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp return func(self, context, *args, **kwargs)
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/network/api.py", line 338, in get_instance_nw_info
2013-02-08 13:58:05.300 TRACE nova.openstack.common.rpc...

Read more...

Revision history for this message
Phil Day (philip-day) wrote :

It looks as if the instance structure being passed across from compute/api/trigger_rules_refresh doesn't have the instance_type structure populated.

The instance comes from db.security_group_get()

Not sure why this would be, not why it should matter in this context.

Revision history for this message
Phil Day (philip-day) wrote :

Actually there are multiple exception in here, and I think the first may be a race condition:

During creating the instance, compute manager will rpc call "allocate_for_instance" from the network manager, which will in turn end up making a rpc cast back to the compute_manager to call refresh_instance_security_rules. Since the instance being created is in a group that has a rule referring to itself, that will end up trying to refresh the rules for the new instance.

That call will find its way down to do_refresh_instance_rules() in libvirt/firewall - which will try to get network_info from the global "network_infos" in firewall.

But the entry in network_infos for this instance might not of been added yet, as that only happens when the create of the instance calls "prepare_instance_filter which comes later in the sequence as part of driver.spawn

I *think* that explains the first exception - I don;t know if there is a knock on effect for the others.

Revision history for this message
Jay Pipes (jaypipes) wrote :

The line in the compute log from Andrea that I think is the most telling is this:

2013-02-07 17:27:48.97 TRACE nova.openstack.common.rpc.amqp RemoteError: Remote error: DetachedInstanceError Parent instance <Instance at 0x4338190> is not bound to a Session; lazy load operation of attribute 'system_metadata' cannot proceed

Pretty sure that DB session handling will be the culprit here.

Revision history for this message
Phil Day (philip-day) wrote :

I believe that the second issue here is a problem in jsonutils/to_primitive - which imposes a seemingly arbitrary maximum recursion depth of 3.

The query that is breaking this is security_group_rule_get_by_security_group, which returns a set of rules, which can hae teh structure:

rule -> grantee_group -> Instance -> Instance_type

Running devstack with this max level set to 4 fixes the issue.

Phil Day (philip-day)
Changed in nova:
assignee: nobody → Phil Day (philip-day)
status: New → In Progress
Revision history for this message
Phil Day (philip-day) wrote :
Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-3 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.