500 error on nova datasource delete

Bug #1662809 reported by Eric K
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
congress
Fix Released
Critical
Unassigned

Bug Description

Critical because the error leaves the congress data structure in a corrupted state, causing all further policy changes to error.

I encountered a 500 error on datasource delete.

$ openstack congress policy create policy1
$ openstack congress policy rule create policy1 'execute[nova:flavors.delete(id)] :- nova:flavors(id=id,vcpus=x),s(x), q(10,10)'
$ openstack congress datasource delete nova
Internal server error (HTTP 500) (Request-ID: req-2f8c2788-789d-4ad5-a75a-4d8ef54fa8bc)

The rule creation is required. Appears to need both the nova reference in the head and the body.

If we continue from the error and continue to create & delete the datasource, the same error occurs.
$ openstack congress datasource create --config username=admin --config tenant_name=admin --config auth_url=http://127.0.0.1:5000/v3 --config password=password --config poll_time=5 nova nova
+-------------+----------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------+----------------------------------------------------------------------------------------------------------------+
| config | {u'username': u'admin', u'tenant_name': u'admin', u'auth_url': u'http://127.0.0.1:5000/v3', u'password': |
| | u'password', u'poll_time': u'5'} |
| description | None |
| driver | nova |
| enabled | True |
| id | 32760158-4492-471b-ac45-78897be602bf |
| name | nova |
| type | None |
+-------------+----------------------------------------------------------------------------------------------------------------+
$ openstack congress datasource delete nova
Internal server error (HTTP 500) (Request-ID: req-cedea388-b356-4126-a811-ed53cfcd652f)

But then this strange thing happens:
If I delete and create policy1, and execute the same commands that caused the error the first time, no error occurs.

$ openstack congress datasource create --config username=admin --config tenant_name=admin --config auth_url=http://127.0.0.1:5000/v3 --config password=password --config poll_time=5 nova nova
+-------------+----------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------+----------------------------------------------------------------------------------------------------------------+
| config | {u'username': u'admin', u'tenant_name': u'admin', u'auth_url': u'http://127.0.0.1:5000/v3', u'password': |
| | u'password', u'poll_time': u'5'} |
| description | None |
| driver | nova |
| enabled | True |
| id | 8ecba93c-da6d-4e08-bf0c-210a7892a536 |
| name | nova |
| type | None |
+-------------+----------------------------------------------------------------------------------------------------------------+
$ openstack congress policy delete policy1
user@ubuntu:~$ openstack congress policy create policy1
+--------------+--------------------------------------+
| Field | Value |
+--------------+--------------------------------------+
| abbreviation | polic |
| description | |
| id | 0a2e97a7-dbc4-434a-af41-26b97d3cae8f |
| kind | nonrecursive |
| name | policy1 |
| owner_id | user |
+--------------+--------------------------------------+
$ openstack congress policy rule create policy1 'execute[nova:flavors.delete(id)] :- nova:flavors(id=id,vcpus=x),s(x), q(10,10)'
+---------+--------------------------------------+
| Field | Value |
+---------+--------------------------------------+
| comment | |
| id | 03cb7e6f-074b-4c79-8b51-c67da576ce59 |
| name | None |
| rule | execute[nova:flavors.delete(id)] :- |
| | nova:flavors(id=id, vcpus=x), |
| | s(x), |
| | q(10, 10) |
+---------+--------------------------------------+
$ openstack congress datasource delete nova

But it only seems to work AFTER the error has already occurred. Creating and deleting policy1 on a newly started instance of congress before executing the first script results in the 500 error.

Partial log:
2017-02-08 00:29:18.920 ERROR congress.api.application [^[[01;36mreq-2f8c2788-789d-4ad5-a75a-4d8ef54fa8bc ^[[00;36madmin 4fbfbf77ca1b46e289297020b994b418] ^[[01;35mTraceback (most recent call last):^M
  File "/opt/stack/congress/congress/api/application.py", line 53, in __call__^M
    response = handler.handle_request(request)^M
  File "/opt/stack/congress/congress/api/webservice.py", line 261, in handle_request^M
    return self.delete(request)^M
  File "/opt/stack/congress/congress/api/webservice.py", line 351, in delete^M
    id_, request.params, context=self._get_context(request))^M
  File "/opt/stack/congress/congress/api/datasource_model.py", line 119, in delete_item^M
    timeout=self.dse_long_timeout)^M
  File "/opt/stack/congress/congress/api/base.py", line 40, in invoke_rpc^M
    caller, name, kwargs, timeout=timeout, local=local)^M
  File "/opt/stack/congress/congress/dse2/data_service.py", line 193, in rpc^M
    service, action, kwargs, timeout=timeout, local=local, retry=retry)^M
  File "/opt/stack/congress/congress/dse2/dse_node.py", line 336, in invoke_service_rpc^M
    result = client.call(self.context, method, **kwargs)^M
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 465, in call^M
    return self.prepare().call(ctxt, method, **kwargs)^M
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 169, in call^M
    retry=self.retry)^M
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 97, in _send^M
    timeout=timeout, retry=retry)^M
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 458, in send^M
    retry=retry)^M
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 449, in _send^M
    raise result^M
DataServiceError_Remote: failed to synchronize_datasource after deleting datasource: cf33289d-b4a0-43ee-89c7-c1f6939df259^M
Traceback (most recent call last):^M
^M
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming^M
    res = self.dispatcher.dispatch(message)^M
^M
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 222, in dispatch^M
    return self._do_dispatch(endpoint, method, ctxt, args)^M
^M
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 192, in _do_dispatch^M
    result = func(ctxt, **new_args)^M
^M
  File "/opt/stack/congress/congress/dse2/datasource_manager.py", line 134, in delete_datasource^M
    return self.ds_manager.delete_datasource(datasource)^M
^M
  File "/opt/stack/congress/congress/dse2/datasource_manager.py", line 122, in delete_datasource^M
    raise exception.DataServiceError(msg)^M
^M
DataServiceError: failed to synchronize_datasource after deleting datasource: cf33289d-b4a0-43ee-89c7-c1f6939df259^M

Full log attached.

Encountered on commit eff5618aac492b33f37edea7dcd6bf20036ac222.

Revision history for this message
Eric K (ekcs) wrote :
Revision history for this message
Eric K (ekcs) wrote :

Subsequent delete requests encountered the same error.

Eric K (ekcs)
summary: - 500 error (occasional) on datasource delete
+ 500 error on nova datasource delete
Eric K (ekcs)
description: updated
Eric K (ekcs)
description: updated
Eric K (ekcs)
description: updated
Changed in congress:
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to congress (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/435296

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to congress (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/435297

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to congress (master)

Reviewed: https://review.openstack.org/435294
Committed: https://git.openstack.org/cgit/openstack/congress/commit/?id=989e8019d5b466f5eef40ba7154284bca0084300
Submitter: Jenkins
Branch: master

commit 989e8019d5b466f5eef40ba7154284bca0084300
Author: Eric K <email address hidden>
Date: Fri Feb 17 00:04:49 2017 -0800

    Exclude atomic rule from dependency graph

    Dependency graph used by agnostic excludes positive literals
    of type compile.Literal. But when a policy is deleted, the facts
    in the policy are deleted as atomic rules of type compile.Rule. As
    a result, some nodes in the dependency graph are prematurely deleted
    because the deleting of facts (type compile.Rule) via policy delete
    decreased ref counters that had not been correspondingly increased
    by the adding of those same facts (type compile.Literal). When a
    dependency graph is thus corrupted, congress gives Internal Server
    Error on all future attempts to add/delete/sync rules.

    This patch fixes the problem by having agnostic treat an atomic rule
    (type compile.Rule) the same way it treats an atom
    (type compile.Literal) in the dependency graph -- ignore both.

    Closes-Bug: 1662809

    Change-Id: I8080cbb7c375d90259f7b8a2a62d714ebe4aee5f

Changed in congress:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to congress (stable/ocata)

Reviewed: https://review.openstack.org/435296
Committed: https://git.openstack.org/cgit/openstack/congress/commit/?id=d635bad1ae10d84e8542916d18af573d4b1d1280
Submitter: Jenkins
Branch: stable/ocata

commit d635bad1ae10d84e8542916d18af573d4b1d1280
Author: Eric K <email address hidden>
Date: Fri Feb 17 00:04:49 2017 -0800

    Exclude atomic rule from dependency graph

    Dependency graph used by agnostic excludes positive literals
    of type compile.Literal. But when a policy is deleted, the facts
    in the policy are deleted as atomic rules of type compile.Rule. As
    a result, some nodes in the dependency graph are prematurely deleted
    because the deleting of facts (type compile.Rule) via policy delete
    decreased ref counters that had not been correspondingly increased
    by the adding of those same facts (type compile.Literal). When a
    dependency graph is thus corrupted, congress gives Internal Server
    Error on all future attempts to add/delete/sync rules.

    This patch fixes the problem by having agnostic treat an atomic rule
    (type compile.Rule) the same way it treats an atom
    (type compile.Literal) in the dependency graph -- ignore both.

    Closes-Bug: 1662809

    Change-Id: I8080cbb7c375d90259f7b8a2a62d714ebe4aee5f

tags: added: in-stable-ocata
Eric K (ekcs)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/congress 5.0.0.0rc2

This issue was fixed in the openstack/congress 5.0.0.0rc2 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to congress (stable/newton)

Reviewed: https://review.openstack.org/435297
Committed: https://git.openstack.org/cgit/openstack/congress/commit/?id=fc64736d1929151b9a47bb6fefa09dffd38565d8
Submitter: Jenkins
Branch: stable/newton

commit fc64736d1929151b9a47bb6fefa09dffd38565d8
Author: Eric K <email address hidden>
Date: Fri Feb 17 00:04:49 2017 -0800

    Exclude atomic rule from dependency graph

    Dependency graph used by agnostic excludes positive literals
    of type compile.Literal. But when a policy is deleted, the facts
    in the policy are deleted as atomic rules of type compile.Rule. As
    a result, some nodes in the dependency graph are prematurely deleted
    because the deleting of facts (type compile.Rule) via policy delete
    decreased ref counters that had not been correspondingly increased
    by the adding of those same facts (type compile.Literal). When a
    dependency graph is thus corrupted, congress gives Internal Server
    Error on all future attempts to add/delete/sync rules.

    This patch fixes the problem by having agnostic treat an atomic rule
    (type compile.Rule) the same way it treats an atom
    (type compile.Literal) in the dependency graph -- ignore both.

    Closes-Bug: 1662809

    Change-Id: I8080cbb7c375d90259f7b8a2a62d714ebe4aee5f

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/congress 4.0.1

This issue was fixed in the openstack/congress 4.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/congress 6.0.0.0b1

This issue was fixed in the openstack/congress 6.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.