metadata service PicklingError: Can't pickle <class 'sqlalchemy.orm.session.Session'> when using memcached

Bug #1694666 reported by Eduardo Gonzalez on 2017-05-31
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Artom Lifshitz

Bug Description

Description
===========

Launching an instance causes a lot of error messages in nova-api logs.
The instance is not able to retrieve metadata.

How to reproduce
================

Deploy nova master branch.
Spawn an instance.
Wait the instance to be active.
In nova-api logs will see error messages.

Expected results
================

Instance retrieve metadata information

Actual results
==============

Instance is not able to retrieve metadata

Environment configuration
=========================

OpenStack deployed with kolla
Only source images from master fail, binary(rdo or ubuntu packages) works for now
Affected CentOS, Ubuntu and OracleLinux distributions
Database and memcached works as expected, other services consuming them are not affected.

Logs
====

All logs can be found at kolla gates:

Nova: http://logs.openstack.org/73/469373/1/check/gate-kolla-ansible-dsvm-deploy-centos-source-centos-7-nv/8cecb36/logs/kolla/nova/

Neutron: http://logs.openstack.org/73/469373/1/check/gate-kolla-ansible-dsvm-deploy-centos-source-centos-7-nv/8cecb36/logs/kolla/neutron/

Related errors:

Nova API:

2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler [req-3daa8e91-93e5-4676-b77a-048ad3dd53d2 - - - - -] Failed to get metadata for instance id: 8cbd067f-8cd6-4365-b299-3ffc146d0790
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler Traceback (most recent call last):
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/api/metadata/handler.py", line 285, in _get_meta_by_instance_id
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler remote_address)
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/api/metadata/handler.py", line 87, in get_metadata_by_instance_id
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler self._cache.set(cache_key, data)
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/cache_utils.py", line 116, in set
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler return self.region.set(key, value)
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler File "/var/lib/kolla/venv/lib/python2.7/site-packages/dogpile/cache/region.py", line 973, in set
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler self.backend.set(key, self._value(value))
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler File "/var/lib/kolla/venv/lib/python2.7/site-packages/dogpile/cache/backends/memcached.py", line 178, in set
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler **self.set_arguments
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_cache/backends/memcache_pool.py", line 32, in _run_method
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler return getattr(client, __name)(*args, **kwargs)
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler File "/var/lib/kolla/venv/lib/python2.7/site-packages/memcache.py", line 740, in set
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler return self._set("set", key, val, time, min_compress_len, noreply)
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler File "/var/lib/kolla/venv/lib/python2.7/site-packages/memcache.py", line 1060, in _set
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler return _unsafe_set()
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler File "/var/lib/kolla/venv/lib/python2.7/site-packages/memcache.py", line 1034, in _unsafe_set
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler store_info = self._val_to_store_info(val, min_compress_len)
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler File "/var/lib/kolla/venv/lib/python2.7/site-packages/memcache.py", line 998, in _val_to_store_info
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler pickler.dump(val)
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler PicklingError: Can't pickle <class 'sqlalchemy.orm.session.Session'>: it's not the same object as sqlalchemy.orm.session.Session
2017-05-31 09:09:34.703 31 ERROR nova.api.metadata.handler

Neutron metadata:

2017-05-31 09:09:24.985 23 DEBUG neutron.agent.metadata.agent [-] Request: GET /2009-04-04/meta-data/instance-id HTTP/1.0
Accept: */*
Connection: close
Content-Type: text/plain
Host: 169.254.169.254
User-Agent: curl/7.24.0 (x86_64-pc-linux-gnu) libcurl/7.24.0 OpenSSL/1.0.0j zlib/1.2.6
X-Forwarded-For: 10.0.0.5
X-Neutron-Network-Id: 50395d7c-fcc6-4aef-afdc-931c4573b0d1 __call__ /var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/metadata/agent.py:86
2017-05-31 09:09:25.269 23 WARNING neutron.agent.metadata.agent [-] Remote metadata server experienced an internal server error.
2017-05-31 09:09:25.269 23 INFO eventlet.wsgi.server [-] 10.0.0.5,<local> "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 500 len: 361 time: 0.2839720
2017-05-31 09:09:27.460 23 DEBUG eventlet.wsgi.server [-] (23) accepted '' server /var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/wsgi.py:883

Let me know if need more information.

Regards, Eduardo

description: updated
Matt Riedemann (mriedem) wrote :

Does this also happen in Ocata? I'm wondering if this is a regression in Pike (current master branch).

When you say "Only source images from master fail, binary(rdo or ubuntu packages) works for now" - what is the difference? Are the binary packages from Ocata (nova 15.0.0)?

Eduardo Gonzalez (egonzalez90) wrote :

No, in Ocata is not happening:
Latest build and deploy from kolla gates (centos source):

http://logs.openstack.org/55/468655/1/check/gate-kolla-dsvm-deploy-centos-source-centos-7-nv/8fb1bea/logs/kolla/nova/nova-api.txt.gz

Matt Riedemann (mriedem) wrote :

So it looks like oslo.cache configured for dogpile/memcache will use python-memcached:

https://github.com/openstack/oslo.cache/blob/master/setup.cfg#L37

Which uses pickle:

https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L63

Reading http://www.benfrederickson.com/dont-pickle-your-data/ it sounds like that's a bad idea, not only in terms of performance but also security.

It sounds like this has come up before though, and we shouldn't be trying to serialize the entire InstanceMetadata object which contains things inside it like Nova Instance objects, which could contain oslo.db Sessions that have weird c-code or something inside them.

What Nova should probably be doing is creating a smaller primitive of the InstanceMetadata object for serializing. Someone though artom might be working on this, or maybe knows something that is. We could ask him.

Changed in nova:
status: New → Confirmed
summary: metadata service PicklingError: Can't pickle <class
- 'sqlalchemy.orm.session.Session'>
+ 'sqlalchemy.orm.session.Session'> when using memcached
Artom Lifshitz (notartom) wrote :

An idea we agreed on in IRC was to to_primitive() and from_primitive() before/after we hand/get stuff to/from cache. If anyone wants to have a whack at that they're welcome :) Otherwise I'll try to make it happen.

Matt Riedemann (mriedem) on 2017-06-01
Changed in nova:
importance: Undecided → Medium

Fix proposed to branch: master
Review: https://review.openstack.org/469978

Changed in nova:
assignee: nobody → Artom Lifshitz (notartom)
status: Confirmed → In Progress
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers