"ValueError: Circular reference detected" in send_notification

Bug #1673375 reported by IWAMOTO Toshihiro
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Balazs Gibizer
oslo.messaging
Invalid
Undecided
Unassigned

Bug Description

I guess something is wrong with nova, causing a tempest failure.

http://logs.openstack.org/08/445308/3/check/gate-tempest-dsvm-py35-ubuntu-xenial/7bf0d72/logs/screen-n-api.txt.gz#_2017-03-16_05_31_09_399

2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging Traceback (most recent call last):
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging context)
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/default.py", line 450, in do_execute
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging cursor.execute(statement, parameters)
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/pymysql/cursors.py", line 166, in execute
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging result = self._query(query)
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/pymysql/cursors.py", line 322, in _query
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging conn.query(q)
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 852, in query
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging self._affected_rows = self._read_query_result(unbuffered=unbuffered)
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 1053, in _read_query_result
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging result.read()
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 1336, in read
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging first_packet = self.connection._read_packet()
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 1010, in _read_packet
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging packet.check_error()
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 393, in check_error
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging err.raise_mysql_exception(self._data)
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/pymysql/err.py", line 107, in raise_mysql_exception
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging raise errorclass(errno, errval)
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging pymysql.err.IntegrityError: (1062, "Duplicate entry 'ubuntu-xenial-rax-iad-7906449-6' for key 'uniq_aggregate_hosts0host0aggregate_id'")

Revision history for this message
Matt Riedemann (mriedem) wrote :

Hmm, while ugly, that is a negative test: AggregatesAdminNegativeTestJSON

So I don't think that's why the test job failed:

{3} tempest.api.compute.admin.test_aggregates_negative.AggregatesAdminNegativeTestJSON.test_aggregate_add_existent_host [6.398727s] ... ok

Revision history for this message
Matt Riedemann (mriedem) wrote :

It looks like the traceback is due to this:

2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging During handling of the above exception, another exception occurred:
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging Traceback (most recent call last):
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/oslo_messaging/notify/messaging.py", line 70, in notify
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging retry=retry)
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/oslo_messaging/transport.py", line 104, in _send_notification
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging retry=retry)
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 509, in send_notification
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging envelope=(version == 2.0), notify=True, retry=retry)
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 457, in _send
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging msg = rpc_common.serialize_msg(msg)
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/oslo_messaging/_drivers/common.py", line 293, in serialize_msg
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging _MESSAGE_KEY: jsonutils.dumps(raw_msg)}
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/local/lib/python3.5/dist-packages/oslo_serialization/jsonutils.py", line 190, in dumps
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging return json.dumps(obj, default=default, **kwargs)
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/lib/python3.5/json/__init__.py", line 237, in dumps
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging **kw).encode(obj)
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/lib/python3.5/json/encoder.py", line 198, in encode
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging chunks = self.iterencode(o, _one_shot=True)
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging File "/usr/lib/python3.5/json/encoder.py", line 256, in iterencode
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging return _iterencode(o, 0)
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging ValueError: Circular reference detected
2017-03-16 05:31:09.399 23355 ERROR oslo_messaging.notify.messaging

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looking at logstash:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22_send_notification%5C%22%20AND%20message%3A%5C%22ValueError%3A%20Circular%20reference%20detected%5C%22&from=7d

This seems to only show up on that aggregate negative test, and it's probably this error notification wrapper:

https://github.com/openstack/nova/blob/2380659e358770a3f36253b93a112b9779a23958/nova/compute/api.py#L4602

The AggregateHostExists duplicate error comes out of the call to "aggregate.add_host(host_name)".

I'm not sure what's breaking in the actual notification driver. It also looks like this is happening on non-py35 jobs too.

summary: - Duplicate entry for key 'uniq_aggregate_hosts0host0aggregate_id
+ "ValueError: Circular reference detected" in send_notification
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

It seems the legacy notification emitted from the wrap_exception decorator dumps the context to the payload. The context contains some objects like <oslo_db.sqlalchemy.enginefacade._Default object at 0x7f49ef3f7af8> and that object might have circular references json.dumps doesn't like.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :
Revision history for this message
Shunli Zhou (shunliz) wrote :

jsonutils check the "Circular reference" as below:
      if markers is not None:
                markerid = id(o)
                if markerid in markers:
                    raise ValueError("Circular reference detected")
                markers[markerid] = o
..................
       if markers is not None:
                del markers[markerid]

Did some simple tests as below:
>>> from oslo_db.sqlalchemy.enginefacade import _Default
>>>
>>>
>>> d1 = _Default()
>>> d2 = _Default()
>>> id(d1)
140398845648296
>>> id(d2)
44938024
>>> d2 = _Default(5)
>>> d1 = _Default(5)
>>> id(d2)
44938080
>>> id(d1)
44938136

Seems oslo_db.sqlalchemy.enginefacade._Default instances's id() may not collision each other.

But according to jsonutils.dump source code, it seems only the oslo_db.sqlalchemy.enginefacade._Default instances'id may collision each other and cause "Circular reference".

Maybe #5 can solve this bug.

Changed in nova:
assignee: nobody → Balazs Gibizer (balazs-gibizer)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/446948

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/447071

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/447072

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/447075

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/446948
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3bf177a59cfd0b4e74dba256c3466ba2ea9bfbf7
Submitter: Jenkins
Branch: master

commit 3bf177a59cfd0b4e74dba256c3466ba2ea9bfbf7
Author: Balazs Gibizer <email address hidden>
Date: Fri Mar 17 11:24:49 2017 +0100

    do not include context to exception notification

    The wrap_exception decorator optionally emited a notification.
    Based on the code comments the original intention was not to include the
    context to that notification due to security reasons. However the
    implementation did included the context to the payload of the legacy
    notification.

    Recently we saw circural reference errors during the payload serialization
    of this notification. Based on the logs the only complex data structure
    that could cause circural reference is the context. So this patch
    removes the context from the legacy exception notification.

    The versioned exception notification is not affected as it does not
    contain the args of the decorated function.

    Closes-Bug: #1673375
    Change-Id: I1d217620e52d45595a3e0e49ed57b4ab33cd1688

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/447071
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a8a1915456a86f504d23f215867da730d436fe33
Submitter: Jenkins
Branch: stable/ocata

commit a8a1915456a86f504d23f215867da730d436fe33
Author: Balazs Gibizer <email address hidden>
Date: Fri Mar 17 11:24:49 2017 +0100

    do not include context to exception notification

    The wrap_exception decorator optionally emited a notification.
    Based on the code comments the original intention was not to include the
    context to that notification due to security reasons. However the
    implementation did included the context to the payload of the legacy
    notification.

    Recently we saw circural reference errors during the payload serialization
    of this notification. Based on the logs the only complex data structure
    that could cause circural reference is the context. So this patch
    removes the context from the legacy exception notification.

    The versioned exception notification is not affected as it does not
    contain the args of the decorated function.

    Closes-Bug: #1673375
    Change-Id: I1d217620e52d45595a3e0e49ed57b4ab33cd1688
    (cherry picked from commit 3bf177a59cfd0b4e74dba256c3466ba2ea9bfbf7)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/447072
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d0ee248bab6727555561c15998c58a0f11a5351b
Submitter: Jenkins
Branch: stable/newton

commit d0ee248bab6727555561c15998c58a0f11a5351b
Author: Balazs Gibizer <email address hidden>
Date: Fri Mar 17 11:24:49 2017 +0100

    do not include context to exception notification

    The wrap_exception decorator optionally emited a notification.
    Based on the code comments the original intention was not to include the
    context to that notification due to security reasons. However the
    implementation did included the context to the payload of the legacy
    notification.

    Recently we saw circural reference errors during the payload serialization
    of this notification. Based on the logs the only complex data structure
    that could cause circural reference is the context. So this patch
    removes the context from the legacy exception notification.

    The versioned exception notification is not affected as it does not
    contain the args of the decorated function.

    Closes-Bug: #1673375
    Change-Id: I1d217620e52d45595a3e0e49ed57b4ab33cd1688
    (cherry picked from commit 3bf177a59cfd0b4e74dba256c3466ba2ea9bfbf7)
    (cherry picked from commit a8a1915456a86f504d23f215867da730d436fe33)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/mitaka)

Reviewed: https://review.openstack.org/447075
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=305cdb38db47258909ef83d5918c7c85ef9d7a5b
Submitter: Jenkins
Branch: stable/mitaka

commit 305cdb38db47258909ef83d5918c7c85ef9d7a5b
Author: Balazs Gibizer <email address hidden>
Date: Fri Mar 17 11:24:49 2017 +0100

    do not include context to exception notification

    The wrap_exception decorator optionally emited a notification.
    Based on the code comments the original intention was not to include the
    context to that notification due to security reasons. However the
    implementation did included the context to the payload of the legacy
    notification.

    Recently we saw circural reference errors during the payload serialization
    of this notification. Based on the logs the only complex data structure
    that could cause circural reference is the context. So this patch
    removes the context from the legacy exception notification.

    The versioned exception notification is not affected as it does not
    contain the args of the decorated function.

    Conflicts:
          nova/exception_wrapper.py
          nova/tests/unit/test_exception.py

    NOTE(mriedem): The conflict is due to some refactor in Newton:
    6329d721ef326488d5d660e4f68febf563ed93ab

    Closes-Bug: #1673375
    Change-Id: I1d217620e52d45595a3e0e49ed57b4ab33cd1688
    (cherry picked from commit 3bf177a59cfd0b4e74dba256c3466ba2ea9bfbf7)
    (cherry picked from commit a8a1915456a86f504d23f215867da730d436fe33)
    (cherry picked from commit d0ee248bab6727555561c15998c58a0f11a5351b)

tags: added: in-stable-mitaka
Revision history for this message
IWAMOTO Toshihiro (iwamoto) wrote :

Thanks for fixing the bug!

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I think the fault is originated from nova and cannot really be dealt with in oslo.messaging hence I'm setting the bug on oslo.messaging as invalid

Changed in oslo.messaging:
status: New → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 13.1.4

This issue was fixed in the openstack/nova 13.1.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.5

This issue was fixed in the openstack/nova 14.0.5 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.2

This issue was fixed in the openstack/nova 15.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.0.0b1

This issue was fixed in the openstack/nova 16.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.