Allow for invalid packet sequence in keepalive

Bug #1621702 reported by Amrith Kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack DBaaS (Trove)
Fix Released
High
Peter Stachowski

Bug Description

In the SQLAlchemy keep_alive class, MariaDB is failing as pymysql reports an invalid packet sequence. MariaDB seems to timeout the client in a different way than MySQL and PXC, which manifests itself as the aforementioned invalid sequence. It is now handled as a special-case exception.

Changed in trove:
assignee: Peter Stachowski (peterstac) → Amrith (amrith)
status: New → In Progress
Amrith Kumar (amrith)
Changed in trove:
milestone: none → next
importance: Undecided → High
status: In Progress → Confirmed
Changed in trove:
status: Confirmed → In Progress
Changed in trove:
assignee: Amrith (amrith) → Peter Stachowski (peterstac)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to trove (master)

Reviewed: https://review.openstack.org/362347
Committed: https://git.openstack.org/cgit/openstack/trove/commit/?id=bd761989eead77eead58c91ccb30fcb53d7a5c5d
Submitter: Jenkins
Branch: master

commit bd761989eead77eead58c91ccb30fcb53d7a5c5d
Author: Peter Stachowski <email address hidden>
Date: Mon Aug 29 19:47:47 2016 +0000

    Allow for invalid packet sequence in keepalive

    In the SQLAlchemy keep_alive class, MariaDB is failing
    as pymysql reports an invalid packet sequence.
    MariaDB seems to timeout the client in a different
    way than MySQL and PXC, which manifests itself as the
    aforementioned invalid sequence. It is now handled
    as a special-case exception.

    With this fix, the MariaDB scenario tests now pass.

    The scenario tests were also tweaked a bit, which aided
    in the testing of the fix. 'group=instance' was created,
    plus instance_error properly interleaved with
    instance_create. _has_status now calls get_instance with
    the admin client so that any faults are accompanied by
    a relevant stack trace. Cases where the result code
    was being checked out-of-sequence were removed, and explicit
    calls to check the http code for the right client were added.

    The replication error messages for promote and eject were
    enhanced as well to attempt to debug spurious failures.
    One of those failures was 'Replication is not on after 60 seconds.'
    This was fixed by setting 'MASTER_CONNECT_RETRY' in the mariadb
    gtid replication strategy as was done in:
    https://review.openstack.org/#/c/188933

    Finally, backup_incremental was added to MariaDB supported
    groups and cleaned up elsewhere.

    Closes-Bug: #1621702
    Change-Id: Id6bde5a34e1d79eece3084f761dcd153c38ccbad

Changed in trove:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.