QPID reconnection delay can't be configured

Bug #1281148 reported by Flavio Percoco
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceilometer
Fix Released
High
Flavio Percoco
Havana
Fix Released
Undecided
Flavio Percoco
Cinder
Invalid
Undecided
Unassigned
Havana
Fix Released
High
Flavio Percoco
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
Havana
Fix Released
High
Flavio Percoco
OpenStack Identity (keystone)
Invalid
Undecided
Unassigned
Havana
Fix Released
High
Flavio Percoco
neutron
Fix Released
High
Ihar Hrachyshka
Havana
Fix Released
High
Ihar Hrachyshka
oslo-incubator
Fix Released
High
Flavio Percoco
Havana
Fix Committed
High
Flavio Percoco
oslo.messaging
Fix Released
High
Flavio Percoco

Bug Description

Current qpid's reconnection can get up to 60s and it's not configurable. This is unfortunate because 60s is quite a lot of time to wait for HA systems, which makes this issue a blocker for this kind of deployments.

description: updated
Changed in oslo.messaging:
importance: Undecided → High
assignee: nobody → Flavio Percoco (flaper87)
milestone: none → icehouse-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)

Fix proposed to branch: master
Review: https://review.openstack.org/74080

Changed in oslo.messaging:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo-incubator (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/74087

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)

Fix proposed to branch: master
Review: https://review.openstack.org/74315

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/74315
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=dc1f984dacced6bbd56b4d97847be8a39b61036e
Submitter: Jenkins
Branch: master

commit dc1f984dacced6bbd56b4d97847be8a39b61036e
Author: Flavio Percoco <email address hidden>
Date: Tue Feb 18 10:56:21 2014 +0100

    User a more accurate max_delay for reconnects

    In an HA deployment, a 60 seconds delay between reconnects can be quite
    problematic. This patch changes the delay calculation by setting the max
    delay to 5s and by changing the way it is increased.

    Unfortunately, this is one of the places where both our main drivers are
    not consistent. Rabbit's driver uses configuration parameters for this
    whereas qpid's driver has never had one. However, I would prefer not
    adding configuration paremeters to qpid's driver for the following
    reasons:

        1. Most of OpenStack services depend on the messaging layer, hence
        they need it to be available. A 5s delay seems to be reasonable and
        I could argue the need of tune it further. Although so frequent
        reconnects can add load to the network, that wouldn't be the main
        issue if one of the brokers go down.
        2. We're trying to move away from configuration options towards using
        transport URL. This path is still not clear and I would
        prefer avoiding adding new options until we clear it out.

    Change-Id: I537015f452eb770acba41fdedfe221628f52a920
    Closes-bug: #1281148

Changed in oslo.messaging:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo-incubator (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/76458

Ben Nemec (bnemec)
Changed in oslo:
status: New → Triaged
importance: Undecided → High
Thierry Carrez (ttx)
Changed in oslo.messaging:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo-incubator (master)

Fix proposed to branch: master
Review: https://review.openstack.org/79304

Changed in oslo:
assignee: nobody → Flavio Percoco (flaper87)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo-incubator (master)

Reviewed: https://review.openstack.org/79304
Committed: https://git.openstack.org/cgit/openstack/oslo-incubator/commit/?id=8b628d1e024f787dbb93d508117d9148388c0590
Submitter: Jenkins
Branch: master

commit 8b628d1e024f787dbb93d508117d9148388c0590
Author: Flavio Percoco <email address hidden>
Date: Tue Feb 18 10:56:21 2014 +0100

    User a more accurate max_delay for reconnects

    In an HA deployment, a 60 seconds delay between reconnects can be quite
    problematic. This patch changes the delay calculation by setting the max
    delay to 5s and by changing the way it is increased.

    Unfortunately, this is one of the places where both our main drivers are
    not consistent. Rabbit's driver uses configuration parameters for this
    whereas qpid's driver has never had one. However, I would prefer not
    adding configuration paremeters to qpid's driver for the following
    reasons:

        1. Most of OpenStack services depend on the messaging layer, hence
        they need it to be available. A 5s delay seems to be reasonable and
        I could argue the need of tune it further. Although so frequent
        reconnects can add load to the network, that wouldn't be the main
        issue if one of the brokers go down.
        2. We're trying to move away from configuration options towards using
        transport URL. This path is still not clear and I would
        prefer avoiding adding new options until we clear it out.

    Closes-bug: #1281148
    Change-Id: I537015f452eb770acba41fdedfe221628f52a920

Changed in oslo:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/80998

Changed in neutron:
assignee: nobody → Ihar Hrachyshka (ihar-hrachyshka)
status: New → In Progress
Alan Pevec (apevec)
Changed in nova:
status: New → Invalid
Changed in cinder:
status: New → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo-incubator (stable/havana)

Reviewed: https://review.openstack.org/76458
Committed: https://git.openstack.org/cgit/openstack/oslo-incubator/commit/?id=ad4dfa7171ac7da1c0fd1020f57f3430fa3bb056
Submitter: Jenkins
Branch: stable/havana

commit ad4dfa7171ac7da1c0fd1020f57f3430fa3bb056
Author: Flavio Percoco <email address hidden>
Date: Tue Feb 18 10:56:21 2014 +0100

    User a more accurate max_delay for reconnects

    In an HA deployment, a 60 seconds delay between reconnects can be quite
    problematic. This patch changes the delay calculation by setting the max
    delay to 5s and by changing the way it is increased.

    Unfortunately, this is one of the places where both our main drivers are
    not consistent. Rabbit's driver uses configuration parameters for this
    whereas qpid's driver has never had one. However, I would prefer not
    adding configuration paremeters to qpid's driver for the following
    reasons:

        1. Most of OpenStack services depend on the messaging layer, hence
        they need it to be available. A 5s delay seems to be reasonable and
        I could argue the need of tune it further. Although so frequent
        reconnects can add load to the network, that wouldn't be the main
        issue if one of the brokers go down.
        2. We're trying to move away from configuration options towards using
        transport URL. This path is still not clear and I would
        prefer avoiding adding new options until we clear it out.

    Closes-bug: #1281148

    Change-Id: I537015f452eb770acba41fdedfe221628f52a920
    (cherry picked from commit 8b628d1e024f787dbb93d508117d9148388c0590)

Alan Pevec (apevec)
Changed in neutron:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/82779

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/82786

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/havana)

Reviewed: https://review.openstack.org/82786
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7871486133b67be2b1782a67d9d018747cb18571
Submitter: Jenkins
Branch: stable/havana

commit 7871486133b67be2b1782a67d9d018747cb18571
Author: Ihar Hrachyshka <email address hidden>
Date: Tue Mar 25 12:13:23 2014 +0100

    Use a more accurate max_delay for reconnects

    In an HA deployment, a 60 seconds delay between reconnects can be quite
    problematic. This patch changes the delay calculation by setting the max
    delay to 5s and by changing the way it is increased.

    Unfortunately, this is one of the places where both our main drivers are
    not consistent. Rabbit's driver uses configuration parameters for this
    whereas qpid's driver has never had one. However, I would prefer not
    adding configuration paremeters to qpid's driver for the following
    reasons:

        1. Most of OpenStack services depend on the messaging layer, hence
        they need it to be available. A 5s delay seems to be reasonable and
        I could argue the need of tune it further. Although so frequent
        reconnects can add load to the network, that wouldn't be the main
        issue if one of the brokers go down.
        2. We're trying to move away from configuration options towards using
        transport URL. This path is still not clear and I would
        prefer avoiding adding new options until we clear it out.

    (the patch is synced from oslo-incubator)

    Change-Id: I537015f452eb770acba41fdedfe221628f52a920
    Closes-bug: #1281148

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/83686

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/83689

Changed in keystone:
assignee: nobody → Flavio Percoco (flaper87)
importance: Undecided → High
milestone: none → 2013.2.3
status: New → In Progress
assignee: Flavio Percoco (flaper87) → nobody
status: In Progress → Invalid
importance: High → Undecided
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/havana)

Reviewed: https://review.openstack.org/83686
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=c5cdf8261d9f7fb42ca59a12554d1c0ed9cb0a7e
Submitter: Jenkins
Branch: stable/havana

commit c5cdf8261d9f7fb42ca59a12554d1c0ed9cb0a7e
Author: Flavio Percoco <email address hidden>
Date: Tue Feb 18 10:56:21 2014 +0100

    User a more accurate max_delay for reconnects

    In an HA deployment, a 60 seconds delay between reconnects can be quite
    problematic. This patch changes the delay calculation by setting the max
    delay to 5s and by changing the way it is increased.

    Unfortunately, this is one of the places where both our main drivers are
    not consistent. Rabbit's driver uses configuration parameters for this
    whereas qpid's driver has never had one. However, I would prefer not
    adding configuration paremeters to qpid's driver for the following
    reasons:

        1. Most of OpenStack services depend on the messaging layer, hence
        they need it to be available. A 5s delay seems to be reasonable and
        I could argue the need of tune it further. Although so frequent
        reconnects can add load to the network, that wouldn't be the main
        issue if one of the brokers go down.
        2. We're trying to move away from configuration options towards using
        transport URL. This path is still not clear and I would
        prefer avoiding adding new options until we clear it out.

    Closes-bug: #1281148

    Change-Id: I537015f452eb770acba41fdedfe221628f52a920
    (cherry picked from commit 8b628d1e024f787dbb93d508117d9148388c0590)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystone (stable/havana)

Reviewed: https://review.openstack.org/83738
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=a96d1a44bc0f074729c312e5c2a0f0875edf1765
Submitter: Jenkins
Branch: stable/havana

commit a96d1a44bc0f074729c312e5c2a0f0875edf1765
Author: Flavio Percoco <email address hidden>
Date: Tue Feb 18 10:56:21 2014 +0100

    User a more accurate max_delay for reconnects

    In an HA deployment, a 60 seconds delay between reconnects can be quite
    problematic. This patch changes the delay calculation by setting the max
    delay to 5s and by changing the way it is increased.

    Unfortunately, this is one of the places where both our main drivers are
    not consistent. Rabbit's driver uses configuration parameters for this
    whereas qpid's driver has never had one. However, I would prefer not
    adding configuration paremeters to qpid's driver for the following
    reasons:

        1. Most of OpenStack services depend on the messaging layer, hence
        they need it to be available. A 5s delay seems to be reasonable and
        I could argue the need of tune it further. Although so frequent
        reconnects can add load to the network, that wouldn't be the main
        issue if one of the brokers go down.
        2. We're trying to move away from configuration options towards using
        transport URL. This path is still not clear and I would
        prefer avoiding adding new options until we clear it out.

    Closes-bug: #1281148

    Change-Id: I537015f452eb770acba41fdedfe221628f52a920
    (cherry picked from commit 8b628d1e024f787dbb93d508117d9148388c0590)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/havana)

Reviewed: https://review.openstack.org/83689
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ecb058de339da8ee828b629c072f4dbb8541fc74
Submitter: Jenkins
Branch: stable/havana

commit ecb058de339da8ee828b629c072f4dbb8541fc74
Author: Flavio Percoco <email address hidden>
Date: Tue Feb 18 10:56:21 2014 +0100

    User a more accurate max_delay for reconnects

    In an HA deployment, a 60 seconds delay between reconnects can be quite
    problematic. This patch changes the delay calculation by setting the max
    delay to 5s and by changing the way it is increased.

    Unfortunately, this is one of the places where both our main drivers are
    not consistent. Rabbit's driver uses configuration parameters for this
    whereas qpid's driver has never had one. However, I would prefer not
    adding configuration paremeters to qpid's driver for the following
    reasons:

        1. Most of OpenStack services depend on the messaging layer, hence
        they need it to be available. A 5s delay seems to be reasonable and
        I could argue the need of tune it further. Although so frequent
        reconnects can add load to the network, that wouldn't be the main
        issue if one of the brokers go down.
        2. We're trying to move away from configuration options towards using
        transport URL. This path is still not clear and I would
        prefer avoiding adding new options until we clear it out.

    Closes-bug: #1281148

    Change-Id: I537015f452eb770acba41fdedfe221628f52a920
    (cherry picked from commit 8b628d1e024f787dbb93d508117d9148388c0590)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ceilometer (master)

Fix proposed to branch: master
Review: https://review.openstack.org/84090

Changed in ceilometer:
assignee: nobody → Flavio Percoco (flaper87)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/80998
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=665222b38b7580a30f50c1fef78feebcae666f40
Submitter: Jenkins
Branch: master

commit 665222b38b7580a30f50c1fef78feebcae666f40
Author: Ihar Hrachyshka <email address hidden>
Date: Mon Mar 17 14:18:28 2014 +0100

    Synced rpc and gettextutils modules from oslo-incubator

    The main reason for sync is to get the following oslo-rpc fixes in Neutron:
    * I537015f452eb770acba41fdedfe221628f52a920 (reduces delays when reconnecting
      to Qpid in HA deployments)
    * Ia148baa6e1ec632789ac3621c85173c2c16f3918 (fixed HA failover, Qpid part)
    * I67923cb024bbd143edc8edccf35b9b400df31eb3 (fixed HA failover, RabbitMQ part)

    Latest oslo-incubator commit at the moment of sync:
    * 2eab986ef3c43f8d1e25065e3cbc1307860c25c7

    Change-Id: I2f5bb0d195e050f755ecdbf06a6bbed587a04fbe
    Closes-Bug: 1281148
    Closes-Bug: 1261631

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ceilometer (master)

Reviewed: https://review.openstack.org/84090
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=93471a2f3ee1bf4d41fa1a21375eaba3942b003a
Submitter: Jenkins
Branch: master

commit 93471a2f3ee1bf4d41fa1a21375eaba3942b003a
Author: Flavio Percoco <email address hidden>
Date: Tue Feb 18 10:56:21 2014 +0100

    User a more accurate max_delay for reconnects

    In an HA deployment, a 60 seconds delay between reconnects can be quite
    problematic. This patch changes the delay calculation by setting the max
    delay to 5s and by changing the way it is increased.

    Unfortunately, this is one of the places where both our main drivers are
    not consistent. Rabbit's driver uses configuration parameters for this
    whereas qpid's driver has never had one. However, I would prefer not
    adding configuration paremeters to qpid's driver for the following
    reasons:

        1. Most of OpenStack services depend on the messaging layer, hence
        they need it to be available. A 5s delay seems to be reasonable and
        I could argue the need of tune it further. Although so frequent
        reconnects can add load to the network, that wouldn't be the main
        issue if one of the brokers go down.
        2. We're trying to move away from configuration options towards using
        transport URL. This path is still not clear and I would
        prefer avoiding adding new options until we clear it out.

    Closes-bug: #1281148

    Change-Id: I537015f452eb770acba41fdedfe221628f52a920
    (cherry picked from commit 8b628d1e024f787dbb93d508117d9148388c0590)

Changed in ceilometer:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → icehouse-rc1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ceilometer (stable/havana)

Reviewed: https://review.openstack.org/83739
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=4ffeeadc5caef1ad68c3dbc3f5eec5f74788391b
Submitter: Jenkins
Branch: stable/havana

commit 4ffeeadc5caef1ad68c3dbc3f5eec5f74788391b
Author: Flavio Percoco <email address hidden>
Date: Tue Feb 18 10:56:21 2014 +0100

    User a more accurate max_delay for reconnects

    In an HA deployment, a 60 seconds delay between reconnects can be quite
    problematic. This patch changes the delay calculation by setting the max
    delay to 5s and by changing the way it is increased.

    Unfortunately, this is one of the places where both our main drivers are
    not consistent. Rabbit's driver uses configuration parameters for this
    whereas qpid's driver has never had one. However, I would prefer not
    adding configuration paremeters to qpid's driver for the following
    reasons:

        1. Most of OpenStack services depend on the messaging layer, hence
        they need it to be available. A 5s delay seems to be reasonable and
        I could argue the need of tune it further. Although so frequent
        reconnects can add load to the network, that wouldn't be the main
        issue if one of the brokers go down.
        2. We're trying to move away from configuration options towards using
        transport URL. This path is still not clear and I would
        prefer avoiding adding new options until we clear it out.

    Closes-bug: #1281148

    Change-Id: I537015f452eb770acba41fdedfe221628f52a920
    (cherry picked from commit 93471a2f3ee1bf4d41fa1a21375eaba3942b003a)

Changed in keystone:
milestone: 2013.2.3 → none
Thierry Carrez (ttx)
Changed in oslo:
milestone: none → icehouse-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in oslo:
milestone: icehouse-rc1 → 2014.1
Thierry Carrez (ttx)
Changed in oslo.messaging:
milestone: icehouse-3 → 1.3.0
Thierry Carrez (ttx)
Changed in neutron:
milestone: icehouse-rc1 → 2014.1
Eoghan Glynn (eglynn)
Changed in ceilometer:
importance: Undecided → High
milestone: none → juno-1
Thierry Carrez (ttx)
Changed in ceilometer:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in ceilometer:
milestone: juno-1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.