OpenStack Heat

Tests fails with SQL error 'Command Out of Sync'

Bug #1546431 reported by Thomas Herve on 2016-02-17

This bug report is a duplicate of: Bug #1499669: Heat stucks in DELETE_IN_PROGRESS for some input data. Edit Remove

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Heat	Fix Released	High	Steve Baker	OpenStack Heat newton-rc1

Bug Description

Seen here: http://logs.openstack.org/91/280691/2/gate/gate-heat-dsvm-functional-orig-mysql/065e324

Captured traceback:
~~~~~~~~~~~~~~~~~~~
     Traceback (most recent call last):
       File "/opt/stack/new/heat/heat_integrationtests/common/test.py", line 339, in _stack_delete
         success_on_not_found=True)
       File "/opt/stack/new/heat/heat_integrationtests/common/test.py", line 329, in _wait_for_stack_status
         raise exceptions.TimeoutException(message)
     heat_integrationtests.common.exceptions.TimeoutException: Request timed out
     Details: Stack InstanceGroupUpdatePolicyTest-2000329881 failed to reach DELETE_COMPLETE status within the required time (1200 s).

On the engine logs we can see:

Exception during reset or similar
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 636, in _finalize_fairy
    fairy._reset(pool)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 776, in _reset
    pool._dialect.do_rollback(self)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/dialects/mysql/base.py", line 2526, in do_rollback
    dbapi_connection.rollback()
  File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 762, in rollback
    self._read_ok_packet()
  File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 738, in _read_ok_packet
    raise err.OperationalError(2014, "Command Out of Sync")
OperationalError: (2014, 'Command Out of Sync')

It could be a thread-safety issue? There is no other error that I can see.

Tags:

Angus Salkeld (asalkeld) on 2016-03-31

tags:

added: gate-failure

Revision history for this message

Zane Bitter (zaneb) wrote on 2016-06-15:

I've also noticed that tests that timeout with no explanation (i.e. no failed tests, the gate runner just kills the job) are due to this error, since in the test cleanup code we delete any stacks the test created *and* wait for them to reach DELETE_COMPLETE (why???). So this problem could be much more widespread than we thought.

Example: http://logs.openstack.org/98/323998/2/check/gate-heat-dsvm-functional-orig-mysql-lbaasv2/0fdca14/

Changed in heat:
status:	New → Triaged

Zane Bitter (zaneb) on 2016-06-15

Changed in heat:
milestone:	none → newton-2

Revision history for this message

Zane Bitter (zaneb) wrote on 2016-06-16:

Oh yeah, this is everywhere.

Revision history for this message

Zane Bitter (zaneb) wrote on 2016-06-16:

I wouldn't be surprised if bug 1483670 turned out to have the same cause.

Revision history for this message

Steve Baker (steve-stevebaker) wrote on 2016-06-29:

Logstash query, 42 occurrences in the last 7 days http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Command%20Out%20of%20Sync%5C%22%20tags%3A%5C%22screen-h-eng%5C%22

Revision history for this message

Rabi Mishra (rabi) wrote on 2016-06-30:

It seems to be happening mostly when we stop the greenthread while deleting an IN_PROGRESS stack and the existing action is interrupted.

In these cases the 'Command Out of Sync' errors are associated with ResourceClosedError.

- Few cases it fails when the thread linked function(release) is called which in turn calls persist_state_and_release_lock[1].
- Other cases it fails while setting the resource state to FAILED[2]

Most of the cases this happens for StackValidationTest where we try to delete(cleanup) an IN_PROGRESS stack.

[1]

http://logs.openstack.org/99/330799/5/gate/gate-heat-dsvm-functional-orig-mysql-lbaasv2/c7e34e3/logs/screen-h-eng.txt.gz#_2016-06-30_05_01_58_311

[2]
http://logs.openstack.org/03/335303/3/check/gate-heat-dsvm-functional-orig-mysql-lbaasv2/ffb5434/logs/screen-h-eng.txt.gz#_2016-06-29_22_17_02_904

There are few other cases it fails with just 'Exception during reset or simillar'. Not sure how that happens.

Thomas Herve (therve) on 2016-07-12

Changed in heat:
milestone:	newton-2 → newton-3

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-07-13: Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/341373

Changed in heat:
assignee:	nobody → Rabi Mishra (rabi)
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-07-26: Fix merged to heat (master)

Reviewed: https://review.openstack.org/341373
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=d01b4dc7a8020fb1ae88580ef3b41d279e66a70c
Submitter: Jenkins
Branch: master

commit d01b4dc7a8020fb1ae88580ef3b41d279e66a70c
Author: rabi <email address hidden>
Date: Wed Jul 13 14:18:17 2016 +0530

Don't persist_state of IN_PROGRESS stack again

    When a greenthread is stopped(ex. deleting an IN_PROGRESS stack),
    the linked function(release) is called and it tries to save
    the state of the stack. As we already saved the state of the
    IN_PROGRESS stack, we should not try to do it again.

Change-Id: I598afcd286881bc1d9394152fe38701d5fed473f
Partial-Bug: #1546431

Rabi Mishra (rabi) on 2016-07-28

Changed in heat:
assignee:	Rabi Mishra (rabi) → nobody

Thomas Herve (therve) on 2016-08-29

Changed in heat:
status:	In Progress → New
milestone:	newton-3 → next

Revision history for this message

Zane Bitter (zaneb) wrote on 2016-09-06:

I believe bug 1499669 is a duplicate of this one.

Changed in heat:
status:	New → Triaged
milestone:	next → newton-rc1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-13: Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/369638

Revision history for this message

Steve Baker (steve-stevebaker) wrote on 2016-09-14:

#10

I'm going to mark this as the duplicate, since there is a local reproducer in 1546431

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-14: Related fix merged to heat (master)

#11

Reviewed: https://review.openstack.org/369638
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=6d0e8e9f68018ae64d61f76db900cd20d60d3eed
Submitter: Jenkins
Branch: master

commit 6d0e8e9f68018ae64d61f76db900cd20d60d3eed
Author: Zane Bitter <email address hidden>
Date: Tue Sep 13 15:05:16 2016 -0400

Always use context manager for DB transactions

An exception could occur at any time, so we want to make sure we roll
back any uncompleted transaction when that happens.

Change-Id: Id0d5b65f44981bde24840b8a0d5fdc0b88f24d8a
Related-bug: #1546431

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-14: Fix proposed to heat (master)

#12

Fix proposed to branch: master
Review: https://review.openstack.org/369827

Changed in heat:
assignee:	nobody → Steve Baker (steve-stevebaker)
status:	Triaged → In Progress

Revision history for this message

Zane Bitter (zaneb) wrote on 2016-09-14:

#13

BTW the logstash query above is incorrect, it should be:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Command%20Out%20of%20Sync%5C%22%20AND%20tags%3A%5C%22screen-h-eng.txt%5C%22

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-14: Related fix proposed to heat (master)

#14

Related fix proposed to branch: master
Review: https://review.openstack.org/370301

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-20: Fix merged to heat (master)

#15

Reviewed: https://review.openstack.org/369827
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=3000f904080d8dcd841d913dcd2ae658fb526c1a
Submitter: Jenkins
Branch: master

commit 3000f904080d8dcd841d913dcd2ae658fb526c1a
Author: Steve Baker <email address hidden>
Date: Fri Sep 16 03:29:59 2016 +0000

Legacy delete attempt thread cancel before stop

    The error messages 'Command Out of Sync' are due to the threads being
    stopped in the middle of the database operations. This happens in the
    legacy action when delete is requested during a stack create.

    We have the thread cancel message but that was not being used in this
    case. Thread cancel should provide a more graceful way of ensuring the
    stack is in a FAILED state before the delete is attempted.

    This changes does the following in the delete_stack service method for
    legace engine:
    - if the stack is still locked, send thread cancel message
    - in a subthread wait for the lock to be released, or until a
      timeout based on the 4 minute cancel grace period
    - if the stack is still locked, do a thread stop as before

    Closes-Bug: #1499669
    Closes-Bug: #1546431
    Closes-Bug: #1536451
    Change-Id: I4cd613681f07d295955c4d8a06505d72d83728a0

Changed in heat:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-20: Fix proposed to heat (stable/newton)

#16

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/373518

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-21: Fix merged to heat (stable/newton)

#17

Reviewed: https://review.openstack.org/373518
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=2dd44db1b9cf4b789d8a083df6f97ae1fb5e22d5
Submitter: Jenkins
Branch: stable/newton

commit 2dd44db1b9cf4b789d8a083df6f97ae1fb5e22d5
Author: Steve Baker <email address hidden>
Date: Fri Sep 16 03:29:59 2016 +0000

Legacy delete attempt thread cancel before stop

    Closes-Bug: #1499669
    Closes-Bug: #1546431
    Closes-Bug: #1536451
    Change-Id: I4cd613681f07d295955c4d8a06505d72d83728a0
    (cherry picked from commit 3000f904080d8dcd841d913dcd2ae658fb526c1a)

tags:

added: in-stable-newton

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-21: Change abandoned on heat (master)

#18

Change abandoned by Crag Wolfe (<email address hidden>) on branch: master
Review: https://review.openstack.org/372804

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-27: Fix included in openstack/heat 7.0.0.0rc2

#19

This issue was fixed in the openstack/heat 7.0.0.0rc2 release candidate.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-10: Fix included in openstack/heat 7.0.0

#20

This issue was fixed in the openstack/heat 7.0.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-17: Fix included in openstack/heat 8.0.0.0b1

#21

This issue was fixed in the openstack/heat 8.0.0.0b1 development milestone.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1499669 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.