Comment 0 for bug 1688581

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Version: 9.x

Steps to reproduce:
1. Deploy a MOS env with 3 controllers and 1 compute node
2. Download that file and save it as simulator.py: http://paste.openstack.org/show/608983/
   That is a modified copy of upstream simulator, if you are curious, make a diff against https://github.com/openstack/oslo.messaging/blob/master/tools/simulator.py
3. Go to compute node and apply that patch http://paste.openstack.org/show/608984/ to /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py

4. In console set the following variable:
   RABBIT_URL=rabbit://<user>:<pass>@<node_1_ip>:5673,<user>:<pass>@<node_2_ip>:5673,<user>:<pass>@<node_3_ip>:5673/
   Populate user, pass and node_x_ip using the following parameters from /etc/nova/nova.conf: rabbit_hosts, rabbit_userid and rabbit_password

5. Run
   python simulator.py --url $RABBIT_URL rpc-server -w 1

6. Open another console to controller, which IP goes first in RABBIT_URL list.
7. Open yet another console to the compute node and populate RABBIT_URL variable here as well.
8. Here run
   python simulator.py --url $RABBIT_URL rpc-client --timeout 10 -m 2 -w 10

   With that command simulator will send 2 messages (-m) with timeout set to 10 seconds (--timeout) and interval between messages 10 seconds (-w)

9. Wait for simulator to send the first message and receives response from rpc-server. It is done ones the following lines appear in console:
2017-05-05 14:05:22,129 DEBUG oslo_messaging._drivers.amqpdriver CALL msg_id: ...
2017-05-05 14:05:23,153 DEBUG oslo_messaging._drivers.amqpdriver received reply msg_id: ...

10. Once you see these lines, quickly (you have 10 seconds to do that) switch to controller console opened in step #6 and here execute
    iptables -I OUTPUT 1 -p tcp --sport 5673 -j DROP

    That will block Rabbit traffic to that node.

11. Observe the following lines next:
2017-05-05 14:11:10,076 DEBUG oslo_messaging._drivers.amqpdriver CALL msg_id: ...
2017-05-05 14:12:10,087 DEBUG oslo.messaging._drivers.impl_rabbit Received recoverable error from kombu

What goes next is of no importance. Important here is that it takes oslo.messaging 60 seconds to surrender first attempt and try to reconnect while the timeout is 10 seconds. oslo.messaging should react much faster to the problem.