Cinder

Nova is issuing unnecessary ROLLBACK statements to MySQL

Bug #1007038 reported by aeva black on 2012-05-31

This bug affects 4 people

	Status	Importance	Assigned to
Cinder	Invalid	Low	Unassigned
OpenStack Compute (nova)	Invalid	Low	Unassigned
oslo.db	Expired	Low	Unassigned

Bug Description

I'm not sure exactly where this is coming from yet, but Nova is issuing a "ROLLBACK" to MySQL after nearly every SELECT statement, even though I think the connection should be autocommit mode. This is unnecessary and wastes time (network roundtrip) and resources (database CPU cycles).

I suspect this is getting generated by sqlalchemy when ever a connection is handed back to the pool, since the number of rollbacks roughly coincides with the number of "select 1" statements that I see in the logs. Those are issued by the MySQLPingListener when a connection is taken out of the pool.

I opened a bug already for the unnecessary "select 1" statements, but I'm opening this as a separate bug. If someone finds a way to fix both at once, that'd be great.

Tags:

Revision history for this message

aeva black (tenbrae) wrote on 2012-05-31:

log-sample.floating-ip-create.log Edit (13.5 KiB, text/plain)

Attaching a sample of query log showing the "rollback" statements. For reference, here are the query counts in the sample. This was taken with long_query_time=0, in devstack, during a call to "nova add-floating-ip".

grep -c '# Query_time: ' log-sample.floating-ip-create.log
35
grep -ciP '^INSERT|^UPDATE|^DELETE' log-sample.floating-ip-create.log
1
grep -ciP '^SELECT [^1]' log-sample.floating-ip-create.log
11
grep -cP '^rollback' log-sample.floating-ip-create.log
11

Revision history for this message

aeva black (tenbrae) wrote on 2012-05-31:

Found the source of this. It is coming from SQLAlchemy. It also affects keystone (and probably anything else using SQLalchemy).

http://docs.sqlalchemy.org/en/rel_0_7/core/pooling.html?highlight=reset_on_return#api-documentation-available-pool-implementations

Disabling pool_reset_on_return does indeed stop the ROLLBACK statements, but reveals another issue. Some areas in the Nova code are starting transactions, not changing any rows, and then not closing the transaction. This can be seen in "SHOW INNODB STATUS\G" output as entries like the one below. In production, this could be bad, since long transactions tie up database resources. On the other hand, it is likely that this wouldn't be a problem in a high-traffic environment where the connections get re-used very quickly.

---TRANSACTION 0 1471743, ACTIVE 1801 sec, process no 18808, OS thread id 139970159761152
MySQL thread id 58, query id 13983 localhost stack
Trx read view will not see trx with id >= 0 1471744, sees < 0 1471708

For reference, here's the change to disable reset_on_return:

--- a/nova/db/sqlalchemy/session.py
+++ b/nova/db/sqlalchemy/session.py
@@ -101,6 +101,7 @@ def get_engine():
             "pool_recycle": FLAGS.sql_idle_timeout,
             "echo": False,
             'convert_unicode': True,
+ 'pool_reset_on_return': False,
         }

Sean Dague (sdague) on 2012-06-07

Changed in nova:
status:	New → Confirmed

Thierry Carrez (ttx) on 2012-06-07

Changed in nova:
importance:	Undecided → Low

aeva black (tenbrae) on 2012-07-17

Changed in nova:
assignee:	nobody → Devananda (devananda)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-08-03: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/10797

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

aeva black (tenbrae) wrote on 2012-08-24:

So after digging into this again, I feel it isn't safe to fix this right now.

The issue is that, without the pool_reset_on_return flag, Nova is leaving dangling transactions open. I initially thought this would be OK, but it is definitely _not_ok_, and even devstack/exercise.sh blows up. Not only do some parts of the code fail to close their transaction, but other areas also fail to start a new transaction -- instead getting the context of some other random old transaction. This results in unpredictable errors. Here's an example:

nova/network/manager.py

1233 def allocate_fixed_ip(self, context, instance_id, network, **kwargs):
...
1255 get_vif = self.db.virtual_interface_get_by_instance_and_network
1256 vif = get_vif(context, instance_ref['uuid'], network['id'])
1257 values = {'allocated': True,
1258 'virtual_interface_id': vif['id']}

In my tests, devstack/exercises/floating_ips.sh will randomly fail, but when it fails it is always here. The error is:
'NoneType' object has no attribute '__getitem__'

I believe the cause of this is that the database session which gets used by virtual_interface_get_by_instance_and_network() has a pre-existing transactional context, and is seeing an older version of the database than the caller. In my tests, the age of this transaction was often > 10 seconds old by the time it was actually used. In that older context, the requested record did not exist, so no rows are returned by the query.

So, IMHO, disabling pool_reset_on_return is dangerous until all the Nova DB code handles transactions consistently; it needs to all be auto_commit, or all open and close transactions, but right now there is a mixture.

Changed in nova:
status:	In Progress → Confirmed

John Griffith (john-griffith) on 2012-08-31

Changed in cinder:
status:	New → Confirmed
importance:	Undecided → Low

Sean Dague (sdague) on 2014-09-15

Changed in nova:
assignee:	Devananda van der Veen (devananda) → nobody

Doug Hellmann (doug-hellmann) on 2014-09-15

Changed in oslo.db:
importance:	Undecided → Low
status:	New → Triaged

Davanum Srinivas (DIMS) (dims-v) on 2015-07-13

Changed in nova:
status:	Confirmed → Invalid
Changed in cinder:
status:	Confirmed → Invalid

Davanum Srinivas (DIMS) (dims-v) on 2016-02-29

Changed in oslo.db:
status:	Triaged → Incomplete

Revision history for this message

Launchpad Janitor (janitor) wrote on 2016-04-29:

[Expired for oslo.db because there has been no activity for 60 days.]

Changed in oslo.db:
status:	Incomplete → Expired

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Related blueprints

Nova DB Cleanup

Bug attachments

log-sample.floating-ip-create.log Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.