Comment 9 for bug 1738653

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/pike)

Reviewed: https://review.openstack.org/558826
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=3ac0bbedccb50e4e5b6bad663a872985d908e82a
Submitter: Zuul
Branch: stable/pike

commit 3ac0bbedccb50e4e5b6bad663a872985d908e82a
Author: Zane Bitter <email address hidden>
Date: Wed Apr 4 09:51:12 2018 -0400

    Avoid race in OSWaitCondition test

    While sending a signal to a WaitCondition is synchronous, the actual update
    of the WaitConditionHandle metadata happens asynchronously since the fix
    for bug 1394095. As a result, it's not guaranteed that even the first 6
    signals (which are sent in serially, as opposed to the later ones which are
    deliberately sent in parallel) will be stored in the same order that they
    are sent.

    Crucially, that means that one or more of the signals explicitly sent with
    id 5 may arrive when there have been only three previous signals stored.
    This means that the next signal to arrive with an implicit ID will be the
    fifth signal stored, and therefore also get id 5. Of course we have a log
    message to indicate when an existing signal is overwritten by another with
    the same ID, and we are not seeing it except in the intended case where we
    explicitly send in the same ID twice. That's because the keys have
    different types in the data dict - the explicitly specified ID is the
    string "5", but the implicitly calculated one is the integer 5. But - get
    this - when we serialise the data to JSON both keys are serialised to the
    string "5", and upon deserialisation they collide and one is silently
    dropped on the floor.

    So if the signal with the explicit ID "5" is stored just before the one
    with reason "signal 4", then "signal 4" will effectively be silently
    ignored as the 5th signal to arrive - a slot already filled. And since that
    signal is ignored, the next signal will also be treated as the 5th to
    arrive and ignored, and so on. This leads inexorably to the dreaded
    "WaitConditionTimeout: resources.wait_condition: 4 of 25 received" error.

    For this reason, it's a bad idea to mix explicit IDs that are also integers
    with implicitly assigned IDs. Use an ID that won't collide instead.

    This patch is backported from
    https://git.openstack.org/cgit/openstack/heat-tempest-plugin/commit/?id=2cff12bceb4b568cd8673c9ffa5668d37fcc9da9

    Change-Id: Ie2608285ba9c0ec3f1e4a8bbf1a147ce35ccae00
    Depends-On: https://review.openstack.org/550682
    Closes-Bug: #1738653