Bug #1896635 “Unresponsive services during failover” : Bugs : kolla-ansible

OpenStack Infra (hudson-openstack) on 2020-09-22

Changed in kolla-ansible:
assignee:	nobody → Pierre Riteau (priteau)
status:	New → In Progress

Revision history for this message

Mark Goddard (mgoddard) wrote on 2020-09-28:

#1

https://review.opendev.org/#/c/749632/

Changed in kolla-ansible:
importance:	Undecided → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-09-28: Fix merged to kolla-ansible (master)

#2

Reviewed: https://review.opendev.org/749632
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=c81772024c70b84564cfddb29645390ae378498f
Submitter: Zuul
Branch: master

commit c81772024c70b84564cfddb29645390ae378498f
Author: Pierre Riteau <email address hidden>
Date: Tue Sep 22 17:52:36 2020 +0200

Reduce the use of SQLAlchemy connection pooling

    When the internal VIP is moved in the event of a failure of the active
    controller, OpenStack services can become unresponsive as they try to
    talk with MariaDB using connections from the SQLAlchemy pool.

    It has been argued that OpenStack doesn't really need to use connection
    pooling with MariaDB [1]. This commit reduces the use of connection
    pooling via two configuration options:

    - max_pool_size is set to 1 to allow only a single connection in the
      pool (it is not possible to disable connection pooling entirely via
      oslo.db, and max_pool_size = 0 means unlimited pool size)
    - lower connection_recycle_time from the default of one hour to 10
      seconds, which means the single connection in the pool will be
      recreated regularly

These settings have shown better reactivity of the system in the event
of a failover.

[1] http://lists.openstack.org/pipermail/openstack-dev/2015-April/061808.html

Change-Id: Ib6a62d4428db9b95569314084090472870417f3d
Closes-Bug: #1896635

Changed in kolla-ansible:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-09-29: Fix proposed to kolla-ansible (stable/ussuri)

#3

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/754928

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-09-29: Fix proposed to kolla-ansible (stable/train)

#4

Fix proposed to branch: stable/train
Review: https://review.opendev.org/754929

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-09-29: Fix proposed to kolla-ansible (stable/stein)

#5

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/754931

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-09-29: Fix merged to kolla-ansible (stable/train)

#6

Reviewed: https://review.opendev.org/754929
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=903efc5a9880eee960f131f7954ec568e6f60df1
Submitter: Zuul
Branch: stable/train

commit 903efc5a9880eee960f131f7954ec568e6f60df1
Author: Pierre Riteau <email address hidden>
Date: Tue Sep 22 17:52:36 2020 +0200

Reduce the use of SQLAlchemy connection pooling

    When the internal VIP is moved in the event of a failure of the active
    controller, OpenStack services can become unresponsive as they try to
    talk with MariaDB using connections from the SQLAlchemy pool.

    It has been argued that OpenStack doesn't really need to use connection
    pooling with MariaDB [1]. This commit reduces the use of connection
    pooling via two configuration options:

    - max_pool_size is set to 1 to allow only a single connection in the
      pool (it is not possible to disable connection pooling entirely via
      oslo.db, and max_pool_size = 0 means unlimited pool size)
    - lower connection_recycle_time from the default of one hour to 10
      seconds, which means the single connection in the pool will be
      recreated regularly

These settings have shown better reactivity of the system in the event
of a failover.

[1] http://lists.openstack.org/pipermail/openstack-dev/2015-April/061808.html

    Change-Id: Ib6a62d4428db9b95569314084090472870417f3d
    Closes-Bug: #1896635
    (cherry picked from commit c81772024c70b84564cfddb29645390ae378498f)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-09-29: Fix merged to kolla-ansible (stable/ussuri)

#7

Reviewed: https://review.opendev.org/754928
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=1d4fd52e4e77d5b93f21ccf9f382ae110459f3ff
Submitter: Zuul
Branch: stable/ussuri

commit 1d4fd52e4e77d5b93f21ccf9f382ae110459f3ff
Author: Pierre Riteau <email address hidden>
Date: Tue Sep 22 17:52:36 2020 +0200

Reduce the use of SQLAlchemy connection pooling

    When the internal VIP is moved in the event of a failure of the active
    controller, OpenStack services can become unresponsive as they try to
    talk with MariaDB using connections from the SQLAlchemy pool.

    It has been argued that OpenStack doesn't really need to use connection
    pooling with MariaDB [1]. This commit reduces the use of connection
    pooling via two configuration options:

    - max_pool_size is set to 1 to allow only a single connection in the
      pool (it is not possible to disable connection pooling entirely via
      oslo.db, and max_pool_size = 0 means unlimited pool size)
    - lower connection_recycle_time from the default of one hour to 10
      seconds, which means the single connection in the pool will be
      recreated regularly

These settings have shown better reactivity of the system in the event
of a failover.

[1] http://lists.openstack.org/pipermail/openstack-dev/2015-April/061808.html

    Change-Id: Ib6a62d4428db9b95569314084090472870417f3d
    Closes-Bug: #1896635
    (cherry picked from commit c81772024c70b84564cfddb29645390ae378498f)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-09-29: Fix merged to kolla-ansible (stable/stein)

#8

Reviewed: https://review.opendev.org/754931
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=9ee209080982e66acff3f15fd64c9db965084f8b
Submitter: Zuul
Branch: stable/stein

commit 9ee209080982e66acff3f15fd64c9db965084f8b
Author: Pierre Riteau <email address hidden>
Date: Tue Sep 22 17:52:36 2020 +0200

Reduce the use of SQLAlchemy connection pooling

    When the internal VIP is moved in the event of a failure of the active
    controller, OpenStack services can become unresponsive as they try to
    talk with MariaDB using connections from the SQLAlchemy pool.

    It has been argued that OpenStack doesn't really need to use connection
    pooling with MariaDB [1]. This commit reduces the use of connection
    pooling via two configuration options:

    - max_pool_size is set to 1 to allow only a single connection in the
      pool (it is not possible to disable connection pooling entirely via
      oslo.db, and max_pool_size = 0 means unlimited pool size)
    - lower connection_recycle_time from the default of one hour to 10
      seconds, which means the single connection in the pool will be
      recreated regularly

These settings have shown better reactivity of the system in the event
of a failover.

[1] http://lists.openstack.org/pipermail/openstack-dev/2015-April/061808.html

    Change-Id: Ib6a62d4428db9b95569314084090472870417f3d
    Closes-Bug: #1896635
    (cherry picked from commit c81772024c70b84564cfddb29645390ae378498f)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-11-24: Fix included in openstack/kolla-ansible 8.3.0

#9

This issue was fixed in the openstack/kolla-ansible 8.3.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-01-07: Fix included in openstack/kolla-ansible 10.2.0

#10

This issue was fixed in the openstack/kolla-ansible 10.2.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-01-07: Fix included in openstack/kolla-ansible 9.3.0

#11

This issue was fixed in the openstack/kolla-ansible 9.3.0 release.

kolla-ansible

Unresponsive services during failover

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches

	Status	Importance	Assigned to	Milestone
kolla-ansible	Fix Released	Medium	Pierre Riteau	kolla-ansible 11.0.0 "victoria"
Stein	Fix Released	Medium	Mark Goddard	kolla-ansible 8.3.0 "stein"
Train	Fix Released	Medium	Mark Goddard	kolla-ansible 9.3.0 "Train"
Ussuri	Fix Released	Medium	Mark Goddard	kolla-ansible 10.2.0 "ussuri"
Victoria	Fix Released	Medium	Pierre Riteau	kolla-ansible 11.0.0 "victoria"