OpenStack Compute (nova)

Bug #1007038
Comment #4

Comment 4 for bug 1007038

Revision history for this message

aeva black (tenbrae) wrote on 2012-08-24:

So after digging into this again, I feel it isn't safe to fix this right now.

The issue is that, without the pool_reset_on_return flag, Nova is leaving dangling transactions open. I initially thought this would be OK, but it is definitely _not_ok_, and even devstack/exercise.sh blows up. Not only do some parts of the code fail to close their transaction, but other areas also fail to start a new transaction -- instead getting the context of some other random old transaction. This results in unpredictable errors. Here's an example:

nova/network/manager.py

1233 def allocate_fixed_ip(self, context, instance_id, network, **kwargs):
...
1255 get_vif = self.db.virtual_interface_get_by_instance_and_network
1256 vif = get_vif(context, instance_ref['uuid'], network['id'])
1257 values = {'allocated': True,
1258 'virtual_interface_id': vif['id']}

In my tests, devstack/exercises/floating_ips.sh will randomly fail, but when it fails it is always here. The error is:
'NoneType' object has no attribute '__getitem__'

I believe the cause of this is that the database session which gets used by virtual_interface_get_by_instance_and_network() has a pre-existing transactional context, and is seeing an older version of the database than the caller. In my tests, the age of this transaction was often > 10 seconds old by the time it was actually used. In that older context, the requested record did not exist, so no rows are returned by the query.

So, IMHO, disabling pool_reset_on_return is dangerous until all the Nova DB code handles transactions consistently; it needs to all be auto_commit, or all open and close transactions, but right now there is a mixture.