Fuel for OpenStack

Bug #1459772
Activity log

Activity log for bug #1459772

Date	Who	What changed	Old value	New value	Message
2015-05-28 18:18:30	Dmitry Mescheryakov	bug			added bug
2015-05-28 18:18:36	Dmitry Mescheryakov	mos: importance	Undecided	High
2015-05-28 18:18:39	Dmitry Mescheryakov	mos: milestone		6.1
2015-05-28 18:26:41	Dmitry Mescheryakov	description	Version: 6.1, ISO #474. Full version available at http://paste.openstack.org/show/242594/ Steps to reproduce: 1. Install environment with Swift with 3 controllers and 1 compute node 2. Connect to some controller and disable management network here using the following command: iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP 3. Connect to _another_ controller and execute 10 times command 'swift list' here. Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand. Analysis: The issue occurs when haproxy sends swift request to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed. A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node. In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails. Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone.	Version: 6.1, ISO #474. Full version available at http://paste.openstack.org/show/242594/ Steps to reproduce: 1. Install environment with Swift with 3 controllers and 1 compute node 2. Connect to some controller and disable management network here using the following command: iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP 3. Connect to _another_ controller and execute 10 times command 'swift list' here. Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand. Analysis: The issue occurs when haproxy sends swift request to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed. A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node. In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails. Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. The snapshot could be downloaded by the link: https://drive.google.com/file/d/0B_TRgCViR_cIQVpLQXJ5aVlnUTQ/view?usp=sharing
2015-05-28 18:27:32	Dmitry Mescheryakov	description	Version: 6.1, ISO #474. Full version available at http://paste.openstack.org/show/242594/ Steps to reproduce: 1. Install environment with Swift with 3 controllers and 1 compute node 2. Connect to some controller and disable management network here using the following command: iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP 3. Connect to _another_ controller and execute 10 times command 'swift list' here. Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand. Analysis: The issue occurs when haproxy sends swift request to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed. A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node. In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails. Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. The snapshot could be downloaded by the link: https://drive.google.com/file/d/0B_TRgCViR_cIQVpLQXJ5aVlnUTQ/view?usp=sharing	Version: 6.1, ISO #474. Full version available at http://paste.openstack.org/show/242594/ Steps to reproduce: 1. Install environment with Swift with 3 controllers and 1 compute node 2. Connect to some controller and disable management network here using the following command: iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP 3. Connect to _another_ controller and execute 10 times command 'swift list' here. Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand. Analysis: The issue occurs when haproxy sends user's request for Swift to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed. A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node. In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails. Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. The snapshot could be downloaded by the link: https://drive.google.com/file/d/0B_TRgCViR_cIQVpLQXJ5aVlnUTQ/view?usp=sharing
2015-05-28 18:27:56	Dmitry Mescheryakov	description	Version: 6.1, ISO #474. Full version available at http://paste.openstack.org/show/242594/ Steps to reproduce: 1. Install environment with Swift with 3 controllers and 1 compute node 2. Connect to some controller and disable management network here using the following command: iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP 3. Connect to _another_ controller and execute 10 times command 'swift list' here. Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand. Analysis: The issue occurs when haproxy sends user's request for Swift to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed. A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node. In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails. Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. The snapshot could be downloaded by the link: https://drive.google.com/file/d/0B_TRgCViR_cIQVpLQXJ5aVlnUTQ/view?usp=sharing	Version: 6.1, ISO #474. Full version available at http://paste.openstack.org/show/242594/ Steps to reproduce: 1. Install environment with Swift with 3 controllers and 1 compute node 2. Connect to some controller and disable management network here using the following command: iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP 3. Connect to _another_ controller and execute 10 times command 'swift list' here. Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand. Analysis: The issue occurs when haproxy sends user's request for Swift to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). Haproxy waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed. A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node. In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails. Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. The snapshot could be downloaded by the link: https://drive.google.com/file/d/0B_TRgCViR_cIQVpLQXJ5aVlnUTQ/view?usp=sharing
2015-05-28 18:35:11	Dmitry Mescheryakov	mos: assignee		Fuel Library Team (fuel-library)
2015-05-28 18:35:14	Dmitry Mescheryakov	mos: status	New	Confirmed
2015-05-29 07:16:47	Vladimir Kuklin	tags		to-be-covered-by-system-tests
2015-05-29 07:20:45	Vladimir Kuklin	bug task added		fuel
2015-05-29 07:20:57	Vladimir Kuklin	fuel: assignee		Fuel Library Team (fuel-library)
2015-05-29 07:21:02	Vladimir Kuklin	fuel: milestone		6.1
2015-05-29 07:21:05	Vladimir Kuklin	fuel: importance	Undecided	High
2015-05-29 07:21:07	Vladimir Kuklin	fuel: status	New	Triaged
2015-05-29 07:21:11	Vladimir Kuklin	bug task deleted	mos
2015-05-29 07:42:18	Bogdan Dobrelya	tags	to-be-covered-by-system-tests	low-hanging-fruit to-be-covered-by-system-tests
2015-05-29 07:50:04	Nastya Urlapova	tags	low-hanging-fruit to-be-covered-by-system-tests
2015-05-29 08:22:07	Bogdan Dobrelya	tags		low-hanging-fruit
2015-05-29 09:38:16	Bogdan Dobrelya	fuel: assignee	Fuel Library Team (fuel-library)	Bogdan Dobrelya (bogdando)
2015-05-29 09:56:58	Bogdan Dobrelya	nominated for series		fuel/6.0.x
2015-05-29 09:56:58	Bogdan Dobrelya	bug task added		fuel/6.0.x
2015-05-29 09:57:03	Bogdan Dobrelya	fuel/6.0.x: status	New	Triaged
2015-05-29 09:57:05	Bogdan Dobrelya	fuel/6.0.x: milestone		6.0.2
2015-05-29 09:57:07	Bogdan Dobrelya	fuel/6.0.x: importance	Undecided	High
2015-05-29 09:57:14	Bogdan Dobrelya	fuel/6.0.x: assignee		Fuel Library Team (fuel-library)
2015-05-29 10:04:43	Bogdan Dobrelya	tags	low-hanging-fruit	low-hanging-fruit to-be-covered-by-tests
2015-05-29 10:07:44	Bogdan Dobrelya	tags	low-hanging-fruit to-be-covered-by-tests	low-hanging-fruit swift to-be-covered-by-tests
2015-05-29 11:04:23	Bogdan Dobrelya	tags	low-hanging-fruit swift to-be-covered-by-tests	swift to-be-covered-by-tests
2015-05-29 16:06:23	OpenStack Infra	fuel: status	Triaged	In Progress
2015-06-01 08:36:03	Bogdan Dobrelya	bug			added subscriber Andrey Sledzinskiy
2015-06-01 10:59:43	Vladimir Kuklin	bug task added		mos
2015-06-01 10:59:49	Vladimir Kuklin	mos: status	New	Triaged
2015-06-01 10:59:54	Vladimir Kuklin	mos: importance	Undecided	High
2015-06-01 11:00:01	Vladimir Kuklin	mos: assignee		MOS Swift (mos-swift)
2015-06-01 11:00:03	Vladimir Kuklin	mos: milestone		7.0
2015-06-01 11:57:48	Bogdan Dobrelya	summary	Disabling management net on a single swift node leads to a very long swift response time	Disabling management net on a single swift proxy node leads to a very long swift response time
2015-06-01 14:58:20	OpenStack Infra	fuel: assignee	Bogdan Dobrelya (bogdando)	Vladimir Kuklin (vkuklin)
2015-06-01 18:38:44	OpenStack Infra	fuel: status	In Progress	Fix Committed
2015-06-02 09:19:38	Bogdan Dobrelya	fuel: assignee	Vladimir Kuklin (vkuklin)	Bogdan Dobrelya (bogdando)
2015-07-13 10:15:10	Bogdan Dobrelya	fuel/6.0.x: assignee	Fuel Library Team (fuel-library)	MOS Sustaining (mos-sustaining)
2015-08-20 13:52:08	Timur Nurlygayanov	mos: assignee	MOS Swift (mos-swift)	Fuel Library Team (fuel-library)
2015-08-20 13:52:28	Timur Nurlygayanov	mos: assignee	Fuel Library Team (fuel-library)	Vladimir Kuklin (vkuklin)
2015-08-20 13:52:31	Timur Nurlygayanov	mos: status	Triaged	Fix Committed
2015-09-07 09:12:15	Alexander Arzhanov	tags	swift to-be-covered-by-tests	on-verification swift to-be-covered-by-tests
2015-09-08 11:51:28	Alexander Arzhanov	mos: status	Fix Committed	Fix Released
2015-09-08 11:51:41	Alexander Arzhanov	tags	on-verification swift to-be-covered-by-tests	swift to-be-covered-by-tests
2015-09-09 16:09:17	Vitaly Sedelnik	nominated for series		fuel/6.0-updates
2015-09-09 16:09:17	Vitaly Sedelnik	bug task added		fuel/6.0-updates
2015-09-09 16:09:52	Vitaly Sedelnik	fuel/6.0-updates: status	New	Triaged
2015-09-09 16:09:54	Vitaly Sedelnik	fuel/6.0-updates: importance	Undecided	High
2015-09-09 16:10:04	Vitaly Sedelnik	fuel/6.0-updates: assignee		MOS Maintenance (mos-maintenance)
2015-09-09 16:10:09	Vitaly Sedelnik	fuel/6.0-updates: milestone		6.0-updates
2015-09-09 16:10:16	Vitaly Sedelnik	fuel/6.0.x: status	Triaged	Won't Fix
2015-09-26 10:57:10	Vitaly Sedelnik	fuel/6.0.x: milestone	6.0.2	6.0.1