loose heartbeat, do not reconnect to rabbitmq server

Bug #1493890 reported by baoyonglei
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
oslo.messaging
Fix Released
Undecided
Mehdi Abaakouk

Bug Description

for send purpose connection, it will start a heartbeat_check thread which will run the function:def _heartbeat_thread_job(self):

when it finds the heartbeat from rabbitmq-server was lost, amqp will raise the "too may heartbeat loose" exception. and the _heartbeat_thread_job will catch the exception, and will run the funtion : ensure_connection().

I think the purpose of action is to reconnect to rabbitmq-server, but actually I found the function ensure_connection didn't going to
do reconnection.

The reason as I know is the connection already exists, and the function does nothing, so kombu.connection.autoretry will not do reconnection.

I have tried a modification in my environment, and it works. like below.

try:
803 self._heartbeat_check() -----------> self.ensure(error_callback=None, method=self._heartbeat_check)
804 # NOTE(sileht): We need to drain event to receive
805 # heartbeat from the broker but don't hold the
806 # connection too much times. In amqpdriver a connection
807 # is used exclusivly for read or for write, so we have
808 # to do this for connection used for write drain_events
809 # already do that for other connection
810 try:
811 self.connection.drain_events(timeout=0.001)
812 except socket.timeout:
813 pass
814 except recoverable_errors as exc:
815 LOG.info(_LI("A recoverable connection/channel error "
816 "occurred, trying to reconnect: %s"), exc)
817 self.ensure_connection()

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)

Fix proposed to branch: master
Review: https://review.openstack.org/253510

Changed in oslo.messaging:
assignee: nobody → Mehdi Abaakouk (sileht)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/253511

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/253514

Changed in oslo.messaging:
assignee: Mehdi Abaakouk (sileht) → Davanum Srinivas (DIMS) (dims-v)
Mehdi Abaakouk (sileht)
Changed in oslo.messaging:
assignee: Davanum Srinivas (DIMS) (dims-v) → Mehdi Abaakouk (sileht)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/253510
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=148e8380ce1cc4f60716300b95104aaa2cf8c543
Submitter: Jenkins
Branch: master

commit 148e8380ce1cc4f60716300b95104aaa2cf8c543
Author: Mehdi Abaakouk <email address hidden>
Date: Fri Dec 4 14:57:03 2015 +0100

    Fix reconnection when heartbeat is missed

    When a heartbeat is missing we call ensure_connection()
    that runs a dummy method to trigger the reconnection
    code in kombu. But also the code is triggered only if the
    channel is None.

    In case of the heartbeat threads we didn't reset the channel
    before reconnecting, so the dummy method doesn't do anything.

    This change sets the channel to None to ensure the connection
    is reestablished before the dummy method is run.

    Also it replaces the dummy method by checking the kombu connection
    object. So we are sure the connection is reestablished.

    Change-Id: I39f8cd23c5a5498e6f4c1aa3236ed27f3b5d7c9a
    Closes-bug: #1493890

Changed in oslo.messaging:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (stable/liberty)

Reviewed: https://review.openstack.org/253514
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=006b74a64223306d1871550156f8d73bc96a2cc7
Submitter: Jenkins
Branch: stable/liberty

commit 006b74a64223306d1871550156f8d73bc96a2cc7
Author: Mehdi Abaakouk <email address hidden>
Date: Fri Dec 4 14:57:03 2015 +0100

    Fix reconnection when heartbeat is missed

    When a heartbeat is missing we call ensure_connection()
    that runs a dummy method to trigger the reconnection
    code in kombu. But also the code is triggered only if the
    channel is None.

    In case of the heartbeat threads we didn't reset the channel
    before reconnecting, so the dummy method doesn't do anything.

    This change sets the channel to None to ensure the connection
    is reestablished before the dummy method is run.

    Also it replaces the dummy method by checking the kombu connection
    object. So we are sure the connection is reestablished.

    Closes-bug: #1493890
    (cherry picked from commit I39f8cd23c5a5498e6f4c1aa3236ed27f3b5d7c9a)

    Depends-On: Ibce834c3e76d71a770013cf1b469aa86396751b9
    Change-Id: Iee70ea7ff3816802195b29ba231fadddbe6159da

tags: added: in-stable-liberty
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (stable/kilo)

Reviewed: https://review.openstack.org/253511
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=47c6c8ebbf92c8794ebbeaca76e3728e4a748f69
Submitter: Jenkins
Branch: stable/kilo

commit 47c6c8ebbf92c8794ebbeaca76e3728e4a748f69
Author: Mehdi Abaakouk <email address hidden>
Date: Fri Dec 4 14:57:03 2015 +0100

    Fix reconnection when heartbeat is missed

    When a heartbeat is missing we call ensure_connection()
    that runs a dummy method to trigger the reconnection
    code in kombu. But also the code is triggered only if the
    channel is None.

    In case of the heartbeat threads we didn't reset the channel
    before reconnecting, so the dummy method doesn't do anything.

    This change sets the channel to None to ensure the connection
    is reestablished before the dummy method is run.

    Also it replaces the dummy method by checking the kombu connection
    object. So we are sure the connection is reestablished.

    Closes-bug: #1493890
    (cherry picked from commit I39f8cd23c5a5498e6f4c1aa3236ed27f3b5d7c9a)

    Change-Id: Id98d4054ecbc787e0d44884a9e4c48e3fae803b2

tags: added: in-stable-kilo
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (feature/pika)

Fix proposed to branch: feature/pika
Review: https://review.openstack.org/257373

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (feature/pika)
Download full text (39.3 KiB)

Reviewed: https://review.openstack.org/257373
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=cc0f8cc8a9ff25c9fb081cac5366c12a0c06ec53
Submitter: Jenkins
Branch: feature/pika

commit a5d78891745b6b9e5827271dc305f00acae1392f
Author: OpenStack Proposal Bot <email address hidden>
Date: Fri Dec 11 15:24:05 2015 +0000

    Updated from global requirements

    Change-Id: Ifd78016c067740477a82dbe06d74d5944ba91893

commit 17ccb2306d03a74304c57d31716a54ba2b3b4311
Author: Mehdi Abaakouk <email address hidden>
Date: Fri Dec 11 10:59:54 2015 +0100

    Move to debug a too verbose log

    When a client is gone (died/restart) and somes replies cannot be sent because
    the the exchange of this client will never comeback. We log one message per
    reply every 0.25 messages during 60 seconds. When the only useful log
    is the one where we decide to drop this replies.

    This change moves the less important message to debug level.

    Change-Id: I508787c0db4dcec2c0027b89eb4e65c4f98022b9
    Related-bug: #1524418

commit 46daf858144202a072c4bf8580aeafec11d20e13
Author: Davanum Srinivas <email address hidden>
Date: Fri Dec 11 11:04:13 2015 +0300

    Cleanup parameter docstrings

    Change-Id: I301fdd51446bf0c0a6dd0d05b26da0556db8367d

commit 3ee86964fa460882d8fcac8686edd0e6bfb12008
Author: Mehdi Abaakouk <email address hidden>
Date: Wed Dec 9 19:37:40 2015 +0100

    Revert "default of kombu_missing_consumer_retry_timeout"

    This reverts commit 8c03a6db6c0396099e7425834998da5478a1df7c.

    Closes-bug: #1524418
    Change-Id: I35538a6c15d6402272e4513bc1beaa537b0dd7b9

commit e72599435c59c09277a9da7686b32aa4f9df7ba4
Author: Mehdi Abaakouk <email address hidden>
Date: Wed Dec 9 18:49:19 2015 +0100

    Don't trigger error_callback for known exc

    When AMQPDestinationNotFound is raised, we must not
    call the error_callback method. The exception is logged
    only if needed in upper layer (amqpdriver.py).

    Related-bug: #1524418

    Change-Id: Ic1ddec2d13172532dbaa572d04a4c22c97ac4fe7

commit 185693a6ed57e02b2f94b0fb8f14a91471605969
Author: Mehdi Abaakouk <email address hidden>
Date: Wed Dec 9 11:23:52 2015 +0100

    Improves comment

    Change-Id: Idc8002e6d622435aac48304857985c0f82be3e32

commit 148e8380ce1cc4f60716300b95104aaa2cf8c543
Author: Mehdi Abaakouk <email address hidden>
Date: Fri Dec 4 14:57:03 2015 +0100

    Fix reconnection when heartbeat is missed

    When a heartbeat is missing we call ensure_connection()
    that runs a dummy method to trigger the reconnection
    code in kombu. But also the code is triggered only if the
    channel is None.

    In case of the heartbeat threads we didn't reset the channel
    before reconnecting, so the dummy method doesn't do anything.

    This change sets the channel to None to ensure the connection
    is reestablished before the dummy method is run.

    Also it replaces the dummy method by checking the kombu connection
    object. So we are sure the connection is reestablished.

    Change-Id: I39f8cd23c5a5498e6f4c1aa3236ed27f3b5d7c9a
    Closes-bug: #1493890

commit 05002...

tags: added: in-feature-pika
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.