Activity log for bug #1689801

Date Who What changed Old value New value Message
2017-05-10 11:31:26 Dmitry Mescheryakov bug added bug
2017-05-10 11:35:05 Dmitry Mescheryakov mos: importance Undecided Medium
2017-05-10 11:35:08 Dmitry Mescheryakov mos: status New Confirmed
2017-05-10 11:35:10 Dmitry Mescheryakov mos: assignee Dmitry Mescheryakov (dmitrymex)
2017-05-10 11:35:12 Dmitry Mescheryakov mos: milestone 9.x-updates
2017-05-10 11:42:11 Dmitry Mescheryakov description Version: 9.x Steps to reproduce: 1. Deploy a MOS env with 3 controllers and 1 compute node 2. Download that file and save it as simulator.py: http://paste.openstack.org/show/608983/ That is a modified copy of upstream simulator, if you are curious, make a diff against https://github.com/openstack/oslo.messaging/blob/master/tools/simulator.py 3. Go to compute node and apply that patch http://paste.openstack.org/show/608984/ to /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py 4. In console set the following variable: RABBIT_URL=rabbit://<user>:<pass>@<node_1_ip>:5673,<user>:<pass>@<node_2_ip>:5673,<user>:<pass>@<node_3_ip>:5673/ Populate user, pass and node_x_ip using the following parameters from /etc/nova/nova.conf: rabbit_hosts, rabbit_userid and rabbit_password 5. Open another console to controller, which IP goes first in RABBIT_URL list. 6. Open yet another console to the compute node and populate RABBIT_URL variable here as well. 7. Here run python simulator.py --url $RABBIT_URL rpc-client -m 2 -w 10 --is-cast true With that command simulator will send 2 messages (-m) with interval between messages 10 seconds (-w) in 'cast' mode, hence you don't need rpc server. 8. Wait for simulator to send the first message and receives response from rpc-server. It is done ones the following lines appear in console: 2017-05-10 11:21:53,661 DEBUG oslo_messaging._drivers.amqpdriver CAST unique_id: ... 9. Once you see these lines, quickly (you have 10 seconds to do that) switch to controller console opened in step #6 and here execute iptables -I OUTPUT 1 -p tcp --sport 5673 -j DROP That will block Rabbit traffic to that node. 10. Observe the following lines next: 2017-05-10 11:22:03,677 DEBUG oslo_messaging._drivers.amqpdriver CAST unique_id: 2017-05-10 11:23:03,701 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 192.168.0.3:5673 is unreachable: ... ... 2017-05-10 11:23:09,735 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 192.168.0.3:5673 is unreachable: ... 2017-05-10 11:23:48,365 INFO oslo.messaging._drivers.impl_rabbit Reconnected to AMQP server ... ... Note that 40 seconds pass between last 'server is unreachable' complaint and reconnect. That is artificial delay caused by bug in code, To remove iptables rule set in step #10 on controller execute iptables -D OUTPUT -p tcp --sport 5673 -j DROP Version: 9.x Steps to reproduce: 1. Deploy a MOS env with 3 controllers and 1 compute node 2. Download that file and save it as simulator.py: http://paste.openstack.org/show/608983/    That is a modified copy of upstream simulator, if you are curious, make a diff against https://github.com/openstack/oslo.messaging/blob/master/tools/simulator.py 3. Go to compute node and apply that patch http://paste.openstack.org/show/608984/ to /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py 4. In that console set the following variable:    RABBIT_URL=rabbit://<user>:<pass>@<node_1_ip>:5673,<user>:<pass>@<node_2_ip>:5673,<user>:<pass>@<node_3_ip>:5673/    Populate user, pass and node_x_ip using the following parameters from /etc/nova/nova.conf: rabbit_hosts, rabbit_userid and rabbit_password 5. Open another console to controller, which IP goes first in RABBIT_URL list. 6. Return to console opened in step #3 7. Here run    python simulator.py --url $RABBIT_URL rpc-client -m 2 -w 10 --is-cast true    With that command simulator will send 2 messages (-m) with interval between messages 10 seconds (-w) in 'cast' mode, hence you don't need rpc server. 8. Wait for simulator to send the first message. It is done ones the following lines appear in console: 2017-05-10 11:21:53,661 DEBUG oslo_messaging._drivers.amqpdriver CAST unique_id: ... 9. Once you see these lines, quickly (you have 10 seconds to do that) switch to controller console opened in step #5 and here execute     iptables -I OUTPUT 1 -p tcp --sport 5673 -j DROP     That will block AMQP traffic to that node. 10. Observe the following lines next: 2017-05-10 11:22:03,677 DEBUG oslo_messaging._drivers.amqpdriver CAST unique_id: 2017-05-10 11:23:03,701 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 192.168.0.3:5673 is unreachable: ... ... 2017-05-10 11:23:09,735 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 192.168.0.3:5673 is unreachable: ... 2017-05-10 11:23:48,365 INFO oslo.messaging._drivers.impl_rabbit Reconnected to AMQP server ... ... Note that 40 seconds pass between last 'server is unreachable' complaint and reconnect. That is an artificial delay caused by bug in code. To remove iptables rule set in step #10 on controller execute iptables -D OUTPUT -p tcp --sport 5673 -j DROP That is bug is very similar to https://bugs.launchpad.net/mos/+bug/1688581 and it would be easier to verify them together.
2017-05-10 11:43:12 Dmitry Mescheryakov tags area-oslo customer-found
2017-05-10 11:47:02 Dmitry Mescheryakov description Version: 9.x Steps to reproduce: 1. Deploy a MOS env with 3 controllers and 1 compute node 2. Download that file and save it as simulator.py: http://paste.openstack.org/show/608983/    That is a modified copy of upstream simulator, if you are curious, make a diff against https://github.com/openstack/oslo.messaging/blob/master/tools/simulator.py 3. Go to compute node and apply that patch http://paste.openstack.org/show/608984/ to /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py 4. In that console set the following variable:    RABBIT_URL=rabbit://<user>:<pass>@<node_1_ip>:5673,<user>:<pass>@<node_2_ip>:5673,<user>:<pass>@<node_3_ip>:5673/    Populate user, pass and node_x_ip using the following parameters from /etc/nova/nova.conf: rabbit_hosts, rabbit_userid and rabbit_password 5. Open another console to controller, which IP goes first in RABBIT_URL list. 6. Return to console opened in step #3 7. Here run    python simulator.py --url $RABBIT_URL rpc-client -m 2 -w 10 --is-cast true    With that command simulator will send 2 messages (-m) with interval between messages 10 seconds (-w) in 'cast' mode, hence you don't need rpc server. 8. Wait for simulator to send the first message. It is done ones the following lines appear in console: 2017-05-10 11:21:53,661 DEBUG oslo_messaging._drivers.amqpdriver CAST unique_id: ... 9. Once you see these lines, quickly (you have 10 seconds to do that) switch to controller console opened in step #5 and here execute     iptables -I OUTPUT 1 -p tcp --sport 5673 -j DROP     That will block AMQP traffic to that node. 10. Observe the following lines next: 2017-05-10 11:22:03,677 DEBUG oslo_messaging._drivers.amqpdriver CAST unique_id: 2017-05-10 11:23:03,701 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 192.168.0.3:5673 is unreachable: ... ... 2017-05-10 11:23:09,735 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 192.168.0.3:5673 is unreachable: ... 2017-05-10 11:23:48,365 INFO oslo.messaging._drivers.impl_rabbit Reconnected to AMQP server ... ... Note that 40 seconds pass between last 'server is unreachable' complaint and reconnect. That is an artificial delay caused by bug in code. To remove iptables rule set in step #10 on controller execute iptables -D OUTPUT -p tcp --sport 5673 -j DROP That is bug is very similar to https://bugs.launchpad.net/mos/+bug/1688581 and it would be easier to verify them together. Version: 9.x Steps to reproduce: 1. Deploy a MOS env with 3 controllers and 1 compute node 2. Download that file and save it as simulator.py: http://paste.openstack.org/show/608983/    That is a modified copy of upstream simulator, if you are curious, make a diff against https://github.com/openstack/oslo.messaging/blob/master/tools/simulator.py 3. Go to compute node and apply that patch http://paste.openstack.org/show/608984/ to /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py 4. In that console set the following variable:    RABBIT_URL=rabbit://<user>:<pass>@<node_1_ip>:5673,<user>:<pass>@<node_2_ip>:5673,<user>:<pass>@<node_3_ip>:5673/    Populate user, pass and node_x_ip using the following parameters from /etc/nova/nova.conf: rabbit_hosts, rabbit_userid and rabbit_password 5. Open another console to controller, which IP goes first in RABBIT_URL list. 6. Return to console opened in step #3 7. Here run    python simulator.py --url $RABBIT_URL rpc-client -m 2 -w 10 --is-cast true    With that command simulator will send 2 messages (-m) with interval between messages 10 seconds (-w) in 'cast' mode, hence you don't need rpc server. 8. Wait for simulator to send the first message. It is done ones the following lines appear in console: 2017-05-10 11:21:53,661 DEBUG oslo_messaging._drivers.amqpdriver CAST unique_id: ... 9. Once you see these lines, quickly (you have 10 seconds to do that) switch to controller console opened in step #5 and here execute     iptables -I OUTPUT 1 -p tcp --sport 5673 -j DROP     That will block AMQP traffic to that node. 10. Observe the following lines next: 2017-05-10 11:22:03,677 DEBUG oslo_messaging._drivers.amqpdriver CAST unique_id: 2017-05-10 11:23:03,701 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 192.168.0.3:5673 is unreachable: ... ... 2017-05-10 11:23:09,735 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 192.168.0.3:5673 is unreachable: ... 2017-05-10 11:23:48,365 INFO oslo.messaging._drivers.impl_rabbit Reconnected to AMQP server ... ... Note that 40 seconds pass between last 'server is unreachable' complaint and reconnect. That is an artificial delay caused by bug in code. To remove iptables rule set in step #10 on controller execute iptables -D OUTPUT -p tcp --sport 5673 -j DROP That bug is very similar to https://bugs.launchpad.net/mos/+bug/1688581 and it would be easier to verify them together.
2017-05-10 11:52:03 Dmitry Mescheryakov description Version: 9.x Steps to reproduce: 1. Deploy a MOS env with 3 controllers and 1 compute node 2. Download that file and save it as simulator.py: http://paste.openstack.org/show/608983/    That is a modified copy of upstream simulator, if you are curious, make a diff against https://github.com/openstack/oslo.messaging/blob/master/tools/simulator.py 3. Go to compute node and apply that patch http://paste.openstack.org/show/608984/ to /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py 4. In that console set the following variable:    RABBIT_URL=rabbit://<user>:<pass>@<node_1_ip>:5673,<user>:<pass>@<node_2_ip>:5673,<user>:<pass>@<node_3_ip>:5673/    Populate user, pass and node_x_ip using the following parameters from /etc/nova/nova.conf: rabbit_hosts, rabbit_userid and rabbit_password 5. Open another console to controller, which IP goes first in RABBIT_URL list. 6. Return to console opened in step #3 7. Here run    python simulator.py --url $RABBIT_URL rpc-client -m 2 -w 10 --is-cast true    With that command simulator will send 2 messages (-m) with interval between messages 10 seconds (-w) in 'cast' mode, hence you don't need rpc server. 8. Wait for simulator to send the first message. It is done ones the following lines appear in console: 2017-05-10 11:21:53,661 DEBUG oslo_messaging._drivers.amqpdriver CAST unique_id: ... 9. Once you see these lines, quickly (you have 10 seconds to do that) switch to controller console opened in step #5 and here execute     iptables -I OUTPUT 1 -p tcp --sport 5673 -j DROP     That will block AMQP traffic to that node. 10. Observe the following lines next: 2017-05-10 11:22:03,677 DEBUG oslo_messaging._drivers.amqpdriver CAST unique_id: 2017-05-10 11:23:03,701 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 192.168.0.3:5673 is unreachable: ... ... 2017-05-10 11:23:09,735 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 192.168.0.3:5673 is unreachable: ... 2017-05-10 11:23:48,365 INFO oslo.messaging._drivers.impl_rabbit Reconnected to AMQP server ... ... Note that 40 seconds pass between last 'server is unreachable' complaint and reconnect. That is an artificial delay caused by bug in code. To remove iptables rule set in step #10 on controller execute iptables -D OUTPUT -p tcp --sport 5673 -j DROP That bug is very similar to https://bugs.launchpad.net/mos/+bug/1688581 and it would be easier to verify them together. Version: 9.x Steps to reproduce: 1. Deploy a MOS env with 3 controllers and 1 compute node 2. Download that file and save it as simulator.py: http://paste.openstack.org/show/608983/    That is a modified copy of upstream simulator, if you are curious, make a diff against https://github.com/openstack/oslo.messaging/blob/master/tools/simulator.py 3. Go to compute node and apply that patch http://paste.openstack.org/show/608984/ to /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py 4. In that console set the following variable:    RABBIT_URL=rabbit://<user>:<pass>@<node_1_ip>:5673,<user>:<pass>@<node_2_ip>:5673,<user>:<pass>@<node_3_ip>:5673/    Populate user, pass and node_x_ip using the following parameters from /etc/nova/nova.conf: rabbit_hosts, rabbit_userid and rabbit_password 5. Open another console to controller, which IP goes first in RABBIT_URL list. 6. Return to console opened in step #3 7. Here run    python simulator.py --url $RABBIT_URL rpc-client -m 2 -w 10 --is-cast true    With that command simulator will send 2 messages (-m) with interval between messages 10 seconds (-w) in 'cast' mode, hence you don't need rpc server. 8. Wait for simulator to send the first message. It is done ones the following lines appear in console: 2017-05-10 11:21:53,661 DEBUG oslo_messaging._drivers.amqpdriver CAST unique_id: ... 9. Once you see these lines, quickly (you have 10 seconds to do that) switch to controller console opened in step #5 and here execute     iptables -I OUTPUT 1 -p tcp --sport 5673 -j DROP     That will block AMQP traffic to that node. 10. Observe the following lines next: 2017-05-10 11:22:03,677 DEBUG oslo_messaging._drivers.amqpdriver CAST unique_id: 2017-05-10 11:23:03,701 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 192.168.0.3:5673 is unreachable: ... ... 2017-05-10 11:23:09,735 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 192.168.0.3:5673 is unreachable: ... 2017-05-10 11:23:48,365 INFO oslo.messaging._drivers.impl_rabbit Reconnected to AMQP server ... ... Note that 40 seconds pass between last 'server is unreachable' complaint and reconnect. That is an artificial delay caused by bug in code. To remove iptables rule set in step #9 on controller execute iptables -D OUTPUT -p tcp --sport 5673 -j DROP That bug is very similar to https://bugs.launchpad.net/mos/+bug/1688581 and it would be easier to verify them together.
2017-05-10 11:55:33 Dmitry Mescheryakov summary oslo.messaging delays reconnection trying to close old channel oslo.messaging delays reconnect trying to close old channel
2017-05-19 09:32:54 Dmitry Mescheryakov mos: status Confirmed In Progress
2017-06-19 12:41:27 Denis Meltsaykin mos: milestone 9.x-updates 9.2-mu-3
2017-07-03 14:07:50 Fuel Devops McRobotson mos: status In Progress Fix Committed
2017-10-13 09:18:11 Ilya Bumarskov mos: status Fix Committed Fix Released