Ubuntu

Failures in db_pool code: 'NoneType' object has no attribute '_keymap' or not returning rows.

Reported by Antony Messerli on 2011-09-01
42
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Brian Lamar
Diablo
Undecided
Unassigned
nova (Ubuntu)
High
Unassigned
Oneiric
High
Unassigned
Precise
High
Unassigned

Bug Description

Running rev 1514, getting this on instance build in XenServer.

2011-09-01 02:38:04,374 ERROR nova.rpc [-] Exception during message handling
(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/rpc/impl_kombu.py", line 620, in _process_data
(nova.rpc): TRACE: rval = node_func(context=ctxt, **node_args)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/exception.py", line 98, in wrapped
(nova.rpc): TRACE: return f(*args, **kw)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/compute/manager.py", line 453, in run_instance
(nova.rpc): TRACE: self._run_instance(context, instance_id, **kwargs)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/compute/manager.py", line 392, in _run_instance
(nova.rpc): TRACE: requested_networks=requested_networks)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/network/api.py", line 162, in allocate_for_instance
(nova.rpc): TRACE: 'args': args})
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/rpc/__init__.py", line 45, in call
(nova.rpc): TRACE: return get_impl().call(context, topic, msg)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/rpc/impl_kombu.py", line 739, in call
(nova.rpc): TRACE: rv = list(rv)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/rpc/impl_kombu.py", line 703, in __iter__
(nova.rpc): TRACE: raise result
(nova.rpc): TRACE: RemoteError: AttributeError 'NoneType' object has no attribute '_keymap'
(nova.rpc): TRACE: [u'Traceback (most recent call last):\n', u' File "/usr/lib/pymodules/python2.6/nova/rpc/impl_kombu.py", line 620, in _process_data\n rval = node_func(context=ctxt, **node_args)\n', u' File "/usr/lib/pymodules/python2.6/nova/network/manager.py", line 462, in allocate_for_instance\n networks, vpn=vpn)\n', u' File "/usr/lib/pymodules/python2.6/nova/network/manager.py", line 971, in _allocate_fixed_ips_hack\n self.allocate_fixed_ip(context, instance_id, nw)\n', u' File "/usr/lib/pymodules/python2.6/nova/network/manager.py", line 653, in allocate_fixed_ip\n instance_id)\n', u' File "/usr/lib/pymodules/python2.6/nova/db/api.py", line 343, in fixed_ip_associate_pool\n instance_id, host)\n', u' File "/usr/lib/pymodules/python2.6/nova/db/sqlalchemy/api.py", line 100, in wrapper\n return f(*args, **kwargs)\n', u' File "/usr/lib/pymodules/python2.6/nova/db/sqlalchemy/api.py", line 714, in fixed_ip_associate_pool\n with_lockmode(\'update\').\\\n', u' File "/usr/lib/python2.6/dist-packages/sqlalchemy/orm/query.py", line 1496, in first\n ret = list(self[0:1])\n', u' File "/usr/lib/python2.6/dist-packages/sqlalchemy/orm/query.py", line 1405, in __getitem__\n return list(res)\n', u' File "/usr/lib/python2.6/dist-packages/sqlalchemy/orm/query.py", line 1669, in instances\n fetch = cursor.fetchall()\n', u' File "/usr/lib/python2.6/dist-packages/sqlalchemy/engine/base.py", line 2383, in fetchall\n l = self.process_rows(self._fetchall_impl())\n', u' File "/usr/lib/python2.6/dist-packages/sqlalchemy/engine/base.py", line 2366, in process_rows\n keymap = metadata._keymap\n', u"AttributeError: 'NoneType' object has no attribute '_keymap'\n"]
(nova.rpc): TRACE:

Brian Lamar (blamar) wrote :

My suspicion here is that fixed_ip_associate_pool is failing when being called in quick succession aka at the 'same time'. I believe this is what "with_lockmode" is supposed to prevent...but the SQLAlchemy documentation is very poor in this area.

We're using InnoDB tables are being used.

Brian Lamar (blamar) wrote :

Arg, my guess is that the combination of SQLAlchemy + eventlet's db_pool is doing this? I'm going to have to look at this more closely but for now Ant, if you want to experiment...on the nova-network nodes just put --sql_max_pool=1 --sql_min_pool=1 to see if these errors go away.

Download full text (4.2 KiB)

That is odd, in mysql, with_lockmode is supposed to row lock and the ip should not be returned on the next query. Perhaps there is some other race condition occurring?

On Sep 2, 2011, at 11:46 AM, Brian Lamar wrote:

> My suspicion here is that fixed_ip_associate_pool is failing when being
> called in quick succession aka at the 'same time'. I believe this is
> what "with_lockmode" is supposed to prevent...but the SQLAlchemy
> documentation is very poor in this area.
>
> We're using InnoDB tables are being used.
>
> --
> You received this bug notification because you are a member of Nova Bug
> Team, which is subscribed to OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/838581
>
> Title:
> 'NoneType' object has no attribute '_keymap' on xenserver instance
> build
>
> Status in OpenStack Compute (Nova):
> New
>
> Bug description:
> Running rev 1514, getting this on instance build in XenServer.
>
> 2011-09-01 02:38:04,374 ERROR nova.rpc [-] Exception during message handling
> (nova.rpc): TRACE: Traceback (most recent call last):
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/rpc/impl_kombu.py", line 620, in _process_data
> (nova.rpc): TRACE: rval = node_func(context=ctxt, **node_args)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/exception.py", line 98, in wrapped
> (nova.rpc): TRACE: return f(*args, **kw)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/compute/manager.py", line 453, in run_instance
> (nova.rpc): TRACE: self._run_instance(context, instance_id, **kwargs)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/compute/manager.py", line 392, in _run_instance
> (nova.rpc): TRACE: requested_networks=requested_networks)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/network/api.py", line 162, in allocate_for_instance
> (nova.rpc): TRACE: 'args': args})
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/rpc/__init__.py", line 45, in call
> (nova.rpc): TRACE: return get_impl().call(context, topic, msg)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/rpc/impl_kombu.py", line 739, in call
> (nova.rpc): TRACE: rv = list(rv)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.6/nova/rpc/impl_kombu.py", line 703, in __iter__
> (nova.rpc): TRACE: raise result
> (nova.rpc): TRACE: RemoteError: AttributeError 'NoneType' object has no attribute '_keymap'
> (nova.rpc): TRACE: [u'Traceback (most recent call last):\n', u' File "/usr/lib/pymodules/python2.6/nova/rpc/impl_kombu.py", line 620, in _process_data\n rval = node_func(context=ctxt, **node_args)\n', u' File "/usr/lib/pymodules/python2.6/nova/network/manager.py", line 462, in allocate_for_instance\n networks, vpn=vpn)\n', u' File "/usr/lib/pymodules/python2.6/nova/network/manager.py", line 971, in _allocate_fixed_ips_hack\n self.allocate_fixed_ip(context, instance_id, nw)\n', u' File "/usr/lib/pymodules/python2.6/nova/network/manager.py", line 653, in allocate_fixed_ip\n instance_id)\n', u' File "/usr/lib/pymodules/python2.6/nova/db/api.py", line 343, in fixed_ip_associate_pool\n instance_id, host...

Read more...

Brian Lamar (blamar) on 2011-09-06
Changed in nova:
assignee: nobody → Brian Lamar (blamar)

Yup, this is fixed by adding a @utils.synchronized decorator on the DB API method which using SQLAlchemy's with_lockmode. What is happening SHOULD look something like this (queries have been simplified to protect my sanity):

Eventlet Greenthread #1 (green1)
Eventlet Greenthread #2 (green2)

green1: START TRANSACTION;
green2: START TRANSACTION;
green1: SELECT * FROM fixed_ips WHERE instance_id is NULL LIMIT 1 FOR UPDATE;
green2: SELECT * FROM fixed_ips WHERE instance_id is NULL LIMIT 1 FOR UPDATE;

At this point "green2" is 100% locked until...

green1: UPDATE fixed_ips SET instance_id = '1' WHERE address = '192.168.0.0';
green1: COMMIT;

Then green2 is able to continue...

green2: UPDATE fixed_ips SET instance_id = '1' WHERE address = '192.168.0.1';
green2: COMMIT;

This very much should work, and is what we *should* being doing in this situation. However, since we're doing 'weird' things with SQLAlchemy + eventlet I don't exactly know which piece is failing (although it seems to be SQLAlchemy?).

I'm going to see if I can investigate a little more, because I don't want to blindly 'fix' this without knowing what is going on, but the linked branch seems to fix all of the issues I can reproduce on my end.

Thierry Carrez (ttx) on 2011-09-07
Changed in nova:
importance: Undecided → High
status: New → In Progress
summary: - 'NoneType' object has no attribute '_keymap' on xenserver instance build
+ 'NoneType' object has no attribute '_keymap'
summary: - 'NoneType' object has no attribute '_keymap'
+ Failures in db_pool code: 'NoneType' object has no attribute '_keymap'
+ or not returning rows.
tags: added: diablo-backport

Reviewed: https://review.openstack.org/678
Committed: http://github.com/openstack/nova/commit/f3dd56e916232e38e74d9e2f24ce9a738cac63cf
Submitter: Jenkins
Branch: master

 status fixcommitted
 done

commit f3dd56e916232e38e74d9e2f24ce9a738cac63cf
Author: Brian Lamar <email address hidden>
Date: Mon Sep 26 11:52:21 2011 -0400

    Removed db_pool complexities from nova.db.sqlalchemy.session. Fixes bug 838581.

    Change-Id: I9b1f4790fcdb890a5113b853f1b1483f3212c530

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
status: Fix Committed → Fix Released
David Kranz (david-kranz) wrote :

Oops. I was trying to find out what "fix released" meant and did not realize I was changing the state the state of this ticket. It won't let me change it back.

Thierry Carrez (ttx) on 2011-09-27
Changed in nova:
status: Fix Released → Fix Committed
Scott Moser (smoser) on 2011-10-04
Changed in nova (Ubuntu):
importance: Undecided → High
status: New → Confirmed
milestone: none → ubuntu-11.10
tags: added: server-o-rs
bastichelaar (bas-t) wrote :

I patched my api.py to revert the db_pool complexities, but it still gives error like this when spawning 25+ instances at once:

2011-10-06 16:36:58,945 DEBUG nova.network.manager [6bf528ce-4e3d-4061-bd3b-1d25602a59ef None None] Leased IP |10.0.0.24| from (pid=27883) lease_fixed_ip /usr/lib/python2.7/dist-packages/nova/network/manager.py:645
ERROR nova.rpc [-] Exception during message handling
(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 620, in _process_data
(nova.rpc): TRACE: rval = node_func(context=ctxt, **node_args)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 221, in allocate_for_instance
(nova.rpc): TRACE: ips = super(FloatingIP, self).allocate_for_instance(context, **kwargs)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 444, in allocate_for_instance
(nova.rpc): TRACE: requested_networks=requested_networks)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 171, in _allocate_fixed_ips
(nova.rpc): TRACE: vpn=vpn, address=address)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 615, in allocate_fixed_ip
(nova.rpc): TRACE: instance_id)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 347, in fixed_ip_associate_pool
(nova.rpc): TRACE: instance_id, host)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 101, in wrapper
(nova.rpc): TRACE: return f(*args, **kwargs)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 724, in fixed_ip_associate_pool
(nova.rpc): TRACE: # then this has concurrency issues
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 1652, in first
(nova.rpc): TRACE: ret = list(self[0:1])
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 1560, in __getitem__
(nova.rpc): TRACE: return list(res)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 1823, in instances
(nova.rpc): TRACE: fetch = cursor.fetchall()
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2528, in fetchall
(nova.rpc): TRACE: l = self.process_rows(self._fetchall_impl())
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2497, in _fetchall_impl
(nova.rpc): TRACE: self._non_result()
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2502, in _non_result
(nova.rpc): TRACE: "This result object does not return rows. "
(nova.rpc): TRACE: ResourceClosedError: This result object does not return rows. It has been closed automatically.
(nova.rpc): TRACE:
ERROR nova.rpc [-] Returning exception This result object does not return rows. It has been closed automatically. to caller

Scott Moser (smoser) wrote :

Hi,
  I've just uploaded a ppa build of nova to https://launchpad.net/~smoser/+archive/ppa with a patch pulled back from trunk which is reported to fix this issue.
  Could someone listening here please try to with my ppa build and report back if that fixes your problem.

  To do this, simply run the following on your openstack systems.
  sudo apt-add-repository ppa:smoser/ppa
  sudo apt-get update
  sudo apt-get install nova

Then do whatever you did that would previously cause the failure.

Thanks.

bastichelaar (bas-t) wrote :

Hi Scott,

Just tested your PPA, and it seems to work fine now. I created 40 instances simultaneously, and it worked out fine now. What are the changes you made?

Thanks,

Bas

Scott Moser (smoser) wrote :
David Kranz (david-kranz) wrote :

I upgraded from Scott's ppa and still see the signs of bug 855030:

AMQPChannelException: (404, u"NOT_FOUND - no exchange '4b0a4421eb4a44ce96ce5f71c8523d16' in vhost '/'", (60, 40), 'Channel.basic_publish')

I looked at /usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/session.py and verified that it had the change mentioned in Scott's last comment. Many vms are created successfully but eventually this always happens to me when I create multiple instances with a single euca-run-instances command. I cannot understand why others do not see this problem. I will reopen that bug.

Nikolay Sokolov (chemikadze) wrote :

> on the nova-network nodes just put --sql_max_pool=1 --sql_min_pool=1 to see if these errors go away.

Tried that, but got same exceptions. Noticed that simultaneous connections exist anyway, opened lp872780.

Hello Antony, or anyone else affected,

Accepted nova into oneiric-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in nova (Ubuntu):
milestone: ubuntu-11.10 → none
status: Confirmed → Fix Committed
Changed in nova (Ubuntu Oneiric):
status: New → Fix Committed
tags: added: verification-needed
Dave Walker (davewalker) on 2011-10-16
Changed in nova (Ubuntu Precise):
milestone: none → precise-alpha-1
Changed in nova (Ubuntu Oneiric):
importance: Undecided → High
milestone: none → oneiric-updates
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 2011.3-0ubuntu7

---------------
nova (2011.3-0ubuntu7) precise; urgency=low

  [Scott Moser]
  * Removed db_pool complexities from nova.db.sqlalchemy.session (LP: #838581)

  [Chuck Short]
  * debian/patches/fix-iscsi-target-path.patch: Fix ISCSI target path patch.
    (LP: #871278)
  * debian/control: Either install xen-hypervisor-4.1-amd64 or
    xen-hypervisor-4.1-i386 for nova-compute-xen. (LP: #873243)
 -- Dave Walker (Daviey) <email address hidden> Sun, 16 Oct 2011 23:00:52 +0100

Changed in nova (Ubuntu Precise):
status: Fix Committed → Fix Released

Reviewed: https://review.openstack.org/937
Committed: http://github.com/openstack/nova/commit/1a5e3e0c4cc2a3bf55331d46f702b980af1fbf97
Submitter: Jenkins
Branch: stable/diablo

 status fixcommitted
 done

commit 1a5e3e0c4cc2a3bf55331d46f702b980af1fbf97
Author: Brian Lamar <email address hidden>
Date: Mon Sep 26 11:52:21 2011 -0400

    Removed db_pool complexities from nova.db.sqlalchemy.session. Fixes bug 838581.

    (cherry picked from commit f3dd56e916232e38e74d9e2f24ce9a738cac63cf)

    Change-Id: I96442f9c9eeddc48ce8307da93ff289e3d679553

Jamie Strandboge (jdstrand) wrote :

oneiric-proposed has been superseded by the 2011.3-0ubuntu6.2 security update.

Thierry Carrez (ttx) on 2011-11-09
Changed in nova:
milestone: none → essex-1
Mark McLoughlin (markmc) on 2011-11-16
tags: removed: diablo-backport
Thierry Carrez (ttx) on 2011-11-17
Changed in nova:
status: Fix Committed → Fix Released
Dave Walker (davewalker) wrote :

Dropping Ubuntu Oneiric upload back to Inprogress as the security upload superseeded it.

Thanks.

Changed in nova (Ubuntu Oneiric):
status: Fix Committed → In Progress
tags: removed: verification-needed
Kegan Holtzhausen (marzubus) wrote :

Just wanted to say the fix in Scott's pap worked for me.

K

Hello Antony, or anyone else affected,

Accepted nova into oneiric-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in nova (Ubuntu Oneiric):
status: In Progress → Fix Committed
tags: added: verification-needed
Tom Ellis (tellis) wrote :

I've tested the package in oneiric-proposed and it fixed the issue. Thanks!

tags: added: verification-done
removed: verification-needed
Thierry Carrez (ttx) on 2012-04-05
Changed in nova:
milestone: essex-1 → 2012.1

I know this is an old bug, but I'm pretty confident now that the problem isn't in db_pool, just that the increased concurrency exposed bugs in nova. This was complicated by bugs in SQLAlchemy (since fixed) and in MySQLdb (which I've opened up a bug report and supplied a patch) that masked the real bug in nova.

The "'NoneType' object has no attribute '_keymap'" exception is most likely caused by a bug in SQLAlchemy that has been fixed for a while. I was only able to reproduce this with older versions of SQLAlchemy, whereas upgrading to a recent version results in the "ResourceClosedError: This result object does not return rows. It has been closed automatically." exception.

The exception is generated by SQLAlchemy, but is ultimately a bug in MySQLdb, where it doesn't always raise an exception correctly when it is received by the server. For some reason, with subtle changes to the query, errors can happen at different points in the protocol and MySQLdb doesn't handle it correctly if the error is received when calling mysql_store_result().

I opened up this bug report and attached a patch to fix the bug in MySQLdb:

https://sourceforge.net/tracker/?func=detail&aid=3546166&group_id=22307&atid=374932

After upgrading SQLAlchemy and fixing MySQLdb, you'll see the real problem, which is likely this exception:

"OperationError: Deadlock found when trying to get lock; try restarting transaction"

At least that was the case with the quota reservation code, tracked by this bug:

https://bugs.launchpad.net/nova/+bug/1026709

I strongly suspect these other cases are similar bugs with incorrect locking since at least the fixed_ip code uses with_lockmode as well.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.