Activity log for bug #1459772

Date Who What changed Old value New value Message
2015-05-28 18:18:30 Dmitry Mescheryakov bug added bug
2015-05-28 18:18:36 Dmitry Mescheryakov mos: importance Undecided High
2015-05-28 18:18:39 Dmitry Mescheryakov mos: milestone 6.1
2015-05-28 18:26:41 Dmitry Mescheryakov description Version: 6.1, ISO #474. Full version available at http://paste.openstack.org/show/242594/ Steps to reproduce: 1. Install environment with Swift with 3 controllers and 1 compute node 2. Connect to some controller and disable management network here using the following command: iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP 3. Connect to _another_ controller and execute 10 times command 'swift list' here. Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand. Analysis: The issue occurs when haproxy sends swift request to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed. A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node. In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails. Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. Version: 6.1, ISO #474. Full version available at http://paste.openstack.org/show/242594/ Steps to reproduce: 1. Install environment with Swift with 3 controllers and 1 compute node 2. Connect to some controller and disable management network here using the following command:       iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP 3. Connect to _another_ controller and execute 10 times command 'swift list' here. Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand. Analysis: The issue occurs when haproxy sends swift request to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed. A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node. In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails. Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. The snapshot could be downloaded by the link: https://drive.google.com/file/d/0B_TRgCViR_cIQVpLQXJ5aVlnUTQ/view?usp=sharing
2015-05-28 18:27:32 Dmitry Mescheryakov description Version: 6.1, ISO #474. Full version available at http://paste.openstack.org/show/242594/ Steps to reproduce: 1. Install environment with Swift with 3 controllers and 1 compute node 2. Connect to some controller and disable management network here using the following command:       iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP 3. Connect to _another_ controller and execute 10 times command 'swift list' here. Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand. Analysis: The issue occurs when haproxy sends swift request to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed. A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node. In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails. Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. The snapshot could be downloaded by the link: https://drive.google.com/file/d/0B_TRgCViR_cIQVpLQXJ5aVlnUTQ/view?usp=sharing Version: 6.1, ISO #474. Full version available at http://paste.openstack.org/show/242594/ Steps to reproduce: 1. Install environment with Swift with 3 controllers and 1 compute node 2. Connect to some controller and disable management network here using the following command:       iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP 3. Connect to _another_ controller and execute 10 times command 'swift list' here. Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand. Analysis: The issue occurs when haproxy sends user's request for Swift to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed. A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node. In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails. Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. The snapshot could be downloaded by the link: https://drive.google.com/file/d/0B_TRgCViR_cIQVpLQXJ5aVlnUTQ/view?usp=sharing
2015-05-28 18:27:56 Dmitry Mescheryakov description Version: 6.1, ISO #474. Full version available at http://paste.openstack.org/show/242594/ Steps to reproduce: 1. Install environment with Swift with 3 controllers and 1 compute node 2. Connect to some controller and disable management network here using the following command:       iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP 3. Connect to _another_ controller and execute 10 times command 'swift list' here. Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand. Analysis: The issue occurs when haproxy sends user's request for Swift to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed. A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node. In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails. Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. The snapshot could be downloaded by the link: https://drive.google.com/file/d/0B_TRgCViR_cIQVpLQXJ5aVlnUTQ/view?usp=sharing Version: 6.1, ISO #474. Full version available at http://paste.openstack.org/show/242594/ Steps to reproduce: 1. Install environment with Swift with 3 controllers and 1 compute node 2. Connect to some controller and disable management network here using the following command:       iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP 3. Connect to _another_ controller and execute 10 times command 'swift list' here. Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand. Analysis: The issue occurs when haproxy sends user's request for Swift to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). Haproxy waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed. A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node. In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails. Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. The snapshot could be downloaded by the link: https://drive.google.com/file/d/0B_TRgCViR_cIQVpLQXJ5aVlnUTQ/view?usp=sharing
2015-05-28 18:35:11 Dmitry Mescheryakov mos: assignee Fuel Library Team (fuel-library)
2015-05-28 18:35:14 Dmitry Mescheryakov mos: status New Confirmed
2015-05-29 07:16:47 Vladimir Kuklin tags to-be-covered-by-system-tests
2015-05-29 07:20:45 Vladimir Kuklin bug task added fuel
2015-05-29 07:20:57 Vladimir Kuklin fuel: assignee Fuel Library Team (fuel-library)
2015-05-29 07:21:02 Vladimir Kuklin fuel: milestone 6.1
2015-05-29 07:21:05 Vladimir Kuklin fuel: importance Undecided High
2015-05-29 07:21:07 Vladimir Kuklin fuel: status New Triaged
2015-05-29 07:21:11 Vladimir Kuklin bug task deleted mos
2015-05-29 07:42:18 Bogdan Dobrelya tags to-be-covered-by-system-tests low-hanging-fruit to-be-covered-by-system-tests
2015-05-29 07:50:04 Nastya Urlapova tags low-hanging-fruit to-be-covered-by-system-tests
2015-05-29 08:22:07 Bogdan Dobrelya tags low-hanging-fruit
2015-05-29 09:38:16 Bogdan Dobrelya fuel: assignee Fuel Library Team (fuel-library) Bogdan Dobrelya (bogdando)
2015-05-29 09:56:58 Bogdan Dobrelya nominated for series fuel/6.0.x
2015-05-29 09:56:58 Bogdan Dobrelya bug task added fuel/6.0.x
2015-05-29 09:57:03 Bogdan Dobrelya fuel/6.0.x: status New Triaged
2015-05-29 09:57:05 Bogdan Dobrelya fuel/6.0.x: milestone 6.0.2
2015-05-29 09:57:07 Bogdan Dobrelya fuel/6.0.x: importance Undecided High
2015-05-29 09:57:14 Bogdan Dobrelya fuel/6.0.x: assignee Fuel Library Team (fuel-library)
2015-05-29 10:04:43 Bogdan Dobrelya tags low-hanging-fruit low-hanging-fruit to-be-covered-by-tests
2015-05-29 10:07:44 Bogdan Dobrelya tags low-hanging-fruit to-be-covered-by-tests low-hanging-fruit swift to-be-covered-by-tests
2015-05-29 11:04:23 Bogdan Dobrelya tags low-hanging-fruit swift to-be-covered-by-tests swift to-be-covered-by-tests
2015-05-29 16:06:23 OpenStack Infra fuel: status Triaged In Progress
2015-06-01 08:36:03 Bogdan Dobrelya bug added subscriber Andrey Sledzinskiy
2015-06-01 10:59:43 Vladimir Kuklin bug task added mos
2015-06-01 10:59:49 Vladimir Kuklin mos: status New Triaged
2015-06-01 10:59:54 Vladimir Kuklin mos: importance Undecided High
2015-06-01 11:00:01 Vladimir Kuklin mos: assignee MOS Swift (mos-swift)
2015-06-01 11:00:03 Vladimir Kuklin mos: milestone 7.0
2015-06-01 11:57:48 Bogdan Dobrelya summary Disabling management net on a single swift node leads to a very long swift response time Disabling management net on a single swift proxy node leads to a very long swift response time
2015-06-01 14:58:20 OpenStack Infra fuel: assignee Bogdan Dobrelya (bogdando) Vladimir Kuklin (vkuklin)
2015-06-01 18:38:44 OpenStack Infra fuel: status In Progress Fix Committed
2015-06-02 09:19:38 Bogdan Dobrelya fuel: assignee Vladimir Kuklin (vkuklin) Bogdan Dobrelya (bogdando)
2015-07-13 10:15:10 Bogdan Dobrelya fuel/6.0.x: assignee Fuel Library Team (fuel-library) MOS Sustaining (mos-sustaining)
2015-08-20 13:52:08 Timur Nurlygayanov mos: assignee MOS Swift (mos-swift) Fuel Library Team (fuel-library)
2015-08-20 13:52:28 Timur Nurlygayanov mos: assignee Fuel Library Team (fuel-library) Vladimir Kuklin (vkuklin)
2015-08-20 13:52:31 Timur Nurlygayanov mos: status Triaged Fix Committed
2015-09-07 09:12:15 Alexander Arzhanov tags swift to-be-covered-by-tests on-verification swift to-be-covered-by-tests
2015-09-08 11:51:28 Alexander Arzhanov mos: status Fix Committed Fix Released
2015-09-08 11:51:41 Alexander Arzhanov tags on-verification swift to-be-covered-by-tests swift to-be-covered-by-tests
2015-09-09 16:09:17 Vitaly Sedelnik nominated for series fuel/6.0-updates
2015-09-09 16:09:17 Vitaly Sedelnik bug task added fuel/6.0-updates
2015-09-09 16:09:52 Vitaly Sedelnik fuel/6.0-updates: status New Triaged
2015-09-09 16:09:54 Vitaly Sedelnik fuel/6.0-updates: importance Undecided High
2015-09-09 16:10:04 Vitaly Sedelnik fuel/6.0-updates: assignee MOS Maintenance (mos-maintenance)
2015-09-09 16:10:09 Vitaly Sedelnik fuel/6.0-updates: milestone 6.0-updates
2015-09-09 16:10:16 Vitaly Sedelnik fuel/6.0.x: status Triaged Won't Fix
2015-09-26 10:57:10 Vitaly Sedelnik fuel/6.0.x: milestone 6.0.2 6.0.1