Heartbeat thread does not recover gracefully from reconnect

Bug #1799546 reported by John Eckersberg
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
oslo.messaging
Confirmed
Medium
Unassigned

Bug Description

(This is on stable/queens, I still need to compare against the latest)

After stopping one RabbitMQ server from a three node cluster, the connection fails over to another as expected, but the heartbeat thread lingers and spams ECONNREFUSED to the logs indefinitely:

Oct 23 12:12:54 centos7 nova-scheduler[7844]: ERROR oslo.messaging._drivers.impl_rabbit [-] [5c4c43cf-5387-4290-8621-7ae97f980ed6] AMQP server on 192.168.122.198:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds. Client port: None: error: [Errno 104] Connection reset by peer
Oct 23 12:12:55 centos7 nova-scheduler[7844]: ERROR oslo.messaging._drivers.impl_rabbit [-] [5c4c43cf-5387-4290-8621-7ae97f980ed6] AMQP server on 192.168.122.198:5672 is unreachable: <AMQPError: unknown error>. Trying again in 1 seconds. Client port: None: RecoverableConnectionError: <AMQPError: unknown error>
Oct 23 12:12:56 centos7 nova-scheduler[7844]: ERROR oslo.messaging._drivers.impl_rabbit [-] [5c4c43cf-5387-4290-8621-7ae97f980ed6] AMQP server on 192.168.122.198:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds. Client port: None: error: [Errno 111] ECONNREFUSED
Oct 23 12:12:57 centos7 nova-scheduler[7844]: INFO oslo.messaging._drivers.impl_rabbit [-] [5c4c43cf-5387-4290-8621-7ae97f980ed6] Reconnected to AMQP server on 192.168.122.198:5674 via [amqp] client with port 57398.
Oct 23 12:13:03 centos7 nova-scheduler[7844]: WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 104] Connection reset by peer
Oct 23 12:13:18 centos7 nova-scheduler[7844]: WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] ECONNREFUSED
Oct 23 12:13:33 centos7 nova-scheduler[7844]: WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] ECONNREFUSED
Oct 23 12:13:48 centos7 nova-scheduler[7844]: WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] ECONNREFUSED
Oct 23 12:14:03 centos7 nova-scheduler[7844]: WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] ECONNREFUSED
Oct 23 12:14:19 centos7 nova-scheduler[7844]: WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] ECONNREFUSED
Oct 23 12:14:34 centos7 nova-scheduler[7844]: WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] ECONNREFUSED
...

Unsurprisingly, the error stops when the RabbitMQ server comes back online.

Revision history for this message
John Eckersberg (jeckersb) wrote :

This seems more difficult to reproduce than I originally thought. It happened the first time I tried the failover, and then after a bunch more iterations it only happened one more time.

Ben Nemec (bnemec)
Changed in oslo.messaging:
status: New → Confirmed
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.