nova-compute crashes when applying a security group rule

Bug #838419 reported by Andrew Glen-Young
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Soren Hansen

Bug Description

After upgrading nova-compute from 2011.3~d4~20110812.1417-0ubuntu1 to 2011.3~d4-0ubuntu1 on Oneiric, the nova-compute daemon crashes with the below message. It looks to me like one of the security groups may no longer validate correctly. I have also included the user's security groups.

Restarting nova-compute has the same effect each time of the daemon crashing.

Crash/error log:

2011-08-31 19:04:52,680 DEBUG nova.virt.libvirt.firewall [-] Adding security group rule: <nova.db.sqlalchemy.models.SecurityGroupIngressRule object at 0x4eec210> from (pid=22771) instance_rules /usr/lib/pymodules/python2.7/nova/virt/libvirt/firewall.py:650
2011-08-31 19:04:52,680 INFO nova.virt.libvirt.firewall [-] Using cidr '0.0.0.0/0'
2011-08-31 19:04:52,680 INFO nova.virt.libvirt.firewall [-] Using fw_rules: ['-m state --state INVALID -j DROP', '-m state --state ESTABLISHED,RELATED -j ACCEPT', '-j $provider', u'-s 172.16.60.1 -p udp --sport 67 --dport 68 -j ACCEPT', u'-s 172.16.60.0/24 -j ACCEPT', '-j ACCEPT -p tcp --dport 22 -s 0.0.0.0/0']
2011-08-31 19:04:52,680 DEBUG nova.virt.libvirt.firewall [-] Adding security group rule: <nova.db.sqlalchemy.models.SecurityGroupIngressRule object at 0x4eec290> from (pid=22771) instance_rules /usr/lib/pymodules/python2.7/nova/virt/libvirt/firewall.py:650
2011-08-31 19:04:52,715 INFO nova.virt.libvirt.firewall [-] instance: <nova.db.sqlalchemy.models.Instance object at 0x4203850>
2011-08-31 19:04:52,770 INFO nova.virt.libvirt.firewall [-] ips: ['172.16.60.26']
2011-08-31 19:04:52,771 CRITICAL nova [-] sequence item 2: expected string, NoneType found
(nova): TRACE: Traceback (most recent call last):
(nova): TRACE: File "/usr/bin/nova-compute", line 48, in <module>
(nova): TRACE: service.wait()
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/service.py", line 367, in wait
(nova): TRACE: _launcher.wait()
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/service.py", line 107, in wait
(nova): TRACE: service.wait()
(nova): TRACE: File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 166, in wait
(nova): TRACE: return self._exit_event.wait()
(nova): TRACE: File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
(nova): TRACE: return hubs.get_hub().switch()
(nova): TRACE: File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 177, in switch
(nova): TRACE: return self.greenlet.switch()
(nova): TRACE: File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 192, in main
(nova): TRACE: result = function(*args, **kwargs)
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/service.py", line 77, in run_server
(nova): TRACE: server.start()
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/service.py", line 137, in start
(nova): TRACE: self.manager.init_host()
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/compute/manager.py", line 175, in init_host
(nova): TRACE: net_info)
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/virt/libvirt/connection.py", line 1583, in ensure_filtering_rules_for_instance
(nova): TRACE: network_info)
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/virt/libvirt/firewall.py", line 546, in prepare_instance_filter
(nova): TRACE: self.add_filters_for_instance(instance)
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/virt/libvirt/firewall.py", line 582, in add_filters_for_instance
(nova): TRACE: ipv4_rules, ipv6_rules = self.instance_rules(instance, network_info)
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/virt/libvirt/firewall.py", line 707, in instance_rules
(nova): TRACE: fw_rules += [' '.join(subrule)]
(nova): TRACE: TypeError: sequence item 2: expected string, NoneType found
(nova): TRACE:

Instance user's security groups:

$ euca-describe-groups | grep user
GROUP user_project default default
GROUP user_project app-internal Ensemble group for internal
PERMISSION user_project app-internal ALLOWS tcp 22 22 FROM CIDR 0.0.0.0/0
PERMISSION user_project app-internal ALLOWS GRPNAME app-internal
GROUP user_project app-internal-0 Ensemble group for internal machine 0
GROUP user_project app-internal-1 Ensemble group for internal machine 1
GROUP user_project app-internal-2 Ensemble group for internal machine 2

Related branches

Changed in nova:
importance: Undecided → High
Revision history for this message
Andrew Glen-Young (aglenyoung) wrote :

I have tracked this down to security groups.

Steps to reproduce:

1. Add a group:
    $ euca-add-group user -d "test group"

2. Authorize the group:
    $ euca-authorize --source-group user user

3. Show the groups:
    $ euca-describe-groups
    GROUP user_project user test group
    PERMISSION user_project user ALLOWS GRPNAME user
    GROUP user_project default default
    PERMISSION user_project default ALLOWS tcp 22 22 FROM CIDR 0.0.0.0/0

4. Start an instance with this security group
5. Stop and start nova-compute
6. nova-compute crashes

The resulting database record is:

sql> SELECT * FROM security_group_rules WHERE protocol IS NULL;
+---------------------+------------+------------+---------+-----+-----------------+----------+-----------+---------+------+----------+
| created_at | updated_at | deleted_at | deleted | id | parent_group_id | protocol | from_port | to_port | cidr | group_id |
+---------------------+------------+------------+---------+-----+-----------------+----------+-----------+---------+------+----------+
| 2011-08-19 05:57:43 | NULL | NULL | 0 | 154 | 78 | NULL | NULL | NULL | NULL | 78 |
+---------------------+------------+------------+---------+-----+-----------------+----------+-----------+---------+------+----------+

The NULL protocol results in the following args in nova/virt/libvirt/firewall.py:
args: ['-j ACCEPT', '-p', None]

Resulting in nova-compute crashing.

Temporary Workaround:

Delete the authorization.

Thierry Carrez (ttx)
Changed in nova:
status: New → Triaged
Thierry Carrez (ttx)
tags: added: security-group
Revision history for this message
Soren Hansen (soren) wrote :

Thanks for reporting this. I've reproduced it and added a test for it. I should have a fix shortly.

Changed in nova:
assignee: nobody → Soren Hansen (soren)
status: Triaged → In Progress
Revision history for this message
Soren Hansen (soren) wrote :

Andrew, any chance you could take the patch for a spin, just to be sure?

    http://bazaar.launchpad.net/~soren/nova/secgroup-fixes/revision/1524

Revision history for this message
Soren Hansen (soren) wrote :

Thanks for the excellent analysis, by the way.

Revision history for this message
Andrew Glen-Young (aglenyoung) wrote :

@Soren: The patch seems to work for me.

= Testing after patching and restarting nova-compute =

$ euca-describe-groups
GROUP user_project default default
PERMISSION user_project default ALLOWS tcp 22 22 FROM CIDR 0.0.0.0/0

$ euca-add-group user -d "test group"
GROUP user test group

$ euca-authorize --source-group user user
GROUP user
PERMISSION user ALLOWS tcp GRPNAME user FROM CIDR 0.0.0.0/0

$ euca-describe-groups
GROUP user_project user test group
PERMISSION user_project user ALLOWS GRPNAME user
GROUP user_project default default
PERMISSION user_project default ALLOWS tcp 22 22 FROM CIDR 0.0.0.0/0

$ euca-run-instances -k user -g user -t m1.tiny ami-00000002
RESERVATION r-8o0h5u0i user_project user
INSTANCE i-0000029e ami-00000002 scheduling user 0 m1.tiny 2011-09-02T13:33:27Z unknown zone aki-00000001 ami-00000000

$ euca-describe-instances i-0000029e
RESERVATION r-8o0h5u0i user_project user
INSTANCE i-0000029e ami-00000002 172.16.60.71 172.16.60.71 running user 0 m1.tiny 2011-09-02T13:33:27Z nova aki-00000001 ami-00000000

/var/log/nova/nova-compute.log:
2011-09-02 14:35:11,085 INFO nova.virt.libvirt_conn [-] Instance instance-0000029e spawned successfully.

Restarting nova-compute again with the running instance does not cause nova-compute to crash.

Thierry Carrez (ttx)
Changed in nova:
milestone: none → 2011.3
Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.