2015-05-28 18:18:30 |
Dmitry Mescheryakov |
bug |
|
|
added bug |
2015-05-28 18:18:36 |
Dmitry Mescheryakov |
mos: importance |
Undecided |
High |
|
2015-05-28 18:18:39 |
Dmitry Mescheryakov |
mos: milestone |
|
6.1 |
|
2015-05-28 18:26:41 |
Dmitry Mescheryakov |
description |
Version: 6.1, ISO #474.
Full version available at http://paste.openstack.org/show/242594/
Steps to reproduce:
1. Install environment with Swift with 3 controllers and 1 compute node
2. Connect to some controller and disable management network here using the following command:
iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP
3. Connect to _another_ controller and execute 10 times command 'swift list' here.
Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand.
Analysis:
The issue occurs when haproxy sends swift request to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed.
A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node.
In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails.
Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. |
Version: 6.1, ISO #474.
Full version available at http://paste.openstack.org/show/242594/
Steps to reproduce:
1. Install environment with Swift with 3 controllers and 1 compute node
2. Connect to some controller and disable management network here using the following command:
iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP
3. Connect to _another_ controller and execute 10 times command 'swift list' here.
Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand.
Analysis:
The issue occurs when haproxy sends swift request to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed.
A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node.
In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails.
Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. The snapshot could be downloaded by the link: https://drive.google.com/file/d/0B_TRgCViR_cIQVpLQXJ5aVlnUTQ/view?usp=sharing |
|
2015-05-28 18:27:32 |
Dmitry Mescheryakov |
description |
Version: 6.1, ISO #474.
Full version available at http://paste.openstack.org/show/242594/
Steps to reproduce:
1. Install environment with Swift with 3 controllers and 1 compute node
2. Connect to some controller and disable management network here using the following command:
iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP
3. Connect to _another_ controller and execute 10 times command 'swift list' here.
Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand.
Analysis:
The issue occurs when haproxy sends swift request to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed.
A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node.
In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails.
Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. The snapshot could be downloaded by the link: https://drive.google.com/file/d/0B_TRgCViR_cIQVpLQXJ5aVlnUTQ/view?usp=sharing |
Version: 6.1, ISO #474.
Full version available at http://paste.openstack.org/show/242594/
Steps to reproduce:
1. Install environment with Swift with 3 controllers and 1 compute node
2. Connect to some controller and disable management network here using the following command:
iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP
3. Connect to _another_ controller and execute 10 times command 'swift list' here.
Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand.
Analysis:
The issue occurs when haproxy sends user's request for Swift to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed.
A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node.
In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails.
Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. The snapshot could be downloaded by the link: https://drive.google.com/file/d/0B_TRgCViR_cIQVpLQXJ5aVlnUTQ/view?usp=sharing |
|
2015-05-28 18:27:56 |
Dmitry Mescheryakov |
description |
Version: 6.1, ISO #474.
Full version available at http://paste.openstack.org/show/242594/
Steps to reproduce:
1. Install environment with Swift with 3 controllers and 1 compute node
2. Connect to some controller and disable management network here using the following command:
iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP
3. Connect to _another_ controller and execute 10 times command 'swift list' here.
Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand.
Analysis:
The issue occurs when haproxy sends user's request for Swift to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed.
A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node.
In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails.
Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. The snapshot could be downloaded by the link: https://drive.google.com/file/d/0B_TRgCViR_cIQVpLQXJ5aVlnUTQ/view?usp=sharing |
Version: 6.1, ISO #474.
Full version available at http://paste.openstack.org/show/242594/
Steps to reproduce:
1. Install environment with Swift with 3 controllers and 1 compute node
2. Connect to some controller and disable management network here using the following command:
iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP
3. Connect to _another_ controller and execute 10 times command 'swift list' here.
Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand.
Analysis:
The issue occurs when haproxy sends user's request for Swift to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). Haproxy waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed.
A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node.
In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails.
Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone. The snapshot could be downloaded by the link: https://drive.google.com/file/d/0B_TRgCViR_cIQVpLQXJ5aVlnUTQ/view?usp=sharing |
|
2015-05-28 18:35:11 |
Dmitry Mescheryakov |
mos: assignee |
|
Fuel Library Team (fuel-library) |
|
2015-05-28 18:35:14 |
Dmitry Mescheryakov |
mos: status |
New |
Confirmed |
|
2015-05-29 07:16:47 |
Vladimir Kuklin |
tags |
|
to-be-covered-by-system-tests |
|
2015-05-29 07:20:45 |
Vladimir Kuklin |
bug task added |
|
fuel |
|
2015-05-29 07:20:57 |
Vladimir Kuklin |
fuel: assignee |
|
Fuel Library Team (fuel-library) |
|
2015-05-29 07:21:02 |
Vladimir Kuklin |
fuel: milestone |
|
6.1 |
|
2015-05-29 07:21:05 |
Vladimir Kuklin |
fuel: importance |
Undecided |
High |
|
2015-05-29 07:21:07 |
Vladimir Kuklin |
fuel: status |
New |
Triaged |
|
2015-05-29 07:21:11 |
Vladimir Kuklin |
bug task deleted |
mos |
|
|
2015-05-29 07:42:18 |
Bogdan Dobrelya |
tags |
to-be-covered-by-system-tests |
low-hanging-fruit to-be-covered-by-system-tests |
|
2015-05-29 07:50:04 |
Nastya Urlapova |
tags |
low-hanging-fruit to-be-covered-by-system-tests |
|
|
2015-05-29 08:22:07 |
Bogdan Dobrelya |
tags |
|
low-hanging-fruit |
|
2015-05-29 09:38:16 |
Bogdan Dobrelya |
fuel: assignee |
Fuel Library Team (fuel-library) |
Bogdan Dobrelya (bogdando) |
|
2015-05-29 09:56:58 |
Bogdan Dobrelya |
nominated for series |
|
fuel/6.0.x |
|
2015-05-29 09:56:58 |
Bogdan Dobrelya |
bug task added |
|
fuel/6.0.x |
|
2015-05-29 09:57:03 |
Bogdan Dobrelya |
fuel/6.0.x: status |
New |
Triaged |
|
2015-05-29 09:57:05 |
Bogdan Dobrelya |
fuel/6.0.x: milestone |
|
6.0.2 |
|
2015-05-29 09:57:07 |
Bogdan Dobrelya |
fuel/6.0.x: importance |
Undecided |
High |
|
2015-05-29 09:57:14 |
Bogdan Dobrelya |
fuel/6.0.x: assignee |
|
Fuel Library Team (fuel-library) |
|
2015-05-29 10:04:43 |
Bogdan Dobrelya |
tags |
low-hanging-fruit |
low-hanging-fruit to-be-covered-by-tests |
|
2015-05-29 10:07:44 |
Bogdan Dobrelya |
tags |
low-hanging-fruit to-be-covered-by-tests |
low-hanging-fruit swift to-be-covered-by-tests |
|
2015-05-29 11:04:23 |
Bogdan Dobrelya |
tags |
low-hanging-fruit swift to-be-covered-by-tests |
swift to-be-covered-by-tests |
|
2015-05-29 16:06:23 |
OpenStack Infra |
fuel: status |
Triaged |
In Progress |
|
2015-06-01 08:36:03 |
Bogdan Dobrelya |
bug |
|
|
added subscriber Andrey Sledzinskiy |
2015-06-01 10:59:43 |
Vladimir Kuklin |
bug task added |
|
mos |
|
2015-06-01 10:59:49 |
Vladimir Kuklin |
mos: status |
New |
Triaged |
|
2015-06-01 10:59:54 |
Vladimir Kuklin |
mos: importance |
Undecided |
High |
|
2015-06-01 11:00:01 |
Vladimir Kuklin |
mos: assignee |
|
MOS Swift (mos-swift) |
|
2015-06-01 11:00:03 |
Vladimir Kuklin |
mos: milestone |
|
7.0 |
|
2015-06-01 11:57:48 |
Bogdan Dobrelya |
summary |
Disabling management net on a single swift node leads to a very long swift response time |
Disabling management net on a single swift proxy node leads to a very long swift response time |
|
2015-06-01 14:58:20 |
OpenStack Infra |
fuel: assignee |
Bogdan Dobrelya (bogdando) |
Vladimir Kuklin (vkuklin) |
|
2015-06-01 18:38:44 |
OpenStack Infra |
fuel: status |
In Progress |
Fix Committed |
|
2015-06-02 09:19:38 |
Bogdan Dobrelya |
fuel: assignee |
Vladimir Kuklin (vkuklin) |
Bogdan Dobrelya (bogdando) |
|
2015-07-13 10:15:10 |
Bogdan Dobrelya |
fuel/6.0.x: assignee |
Fuel Library Team (fuel-library) |
MOS Sustaining (mos-sustaining) |
|
2015-08-20 13:52:08 |
Timur Nurlygayanov |
mos: assignee |
MOS Swift (mos-swift) |
Fuel Library Team (fuel-library) |
|
2015-08-20 13:52:28 |
Timur Nurlygayanov |
mos: assignee |
Fuel Library Team (fuel-library) |
Vladimir Kuklin (vkuklin) |
|
2015-08-20 13:52:31 |
Timur Nurlygayanov |
mos: status |
Triaged |
Fix Committed |
|
2015-09-07 09:12:15 |
Alexander Arzhanov |
tags |
swift to-be-covered-by-tests |
on-verification swift to-be-covered-by-tests |
|
2015-09-08 11:51:28 |
Alexander Arzhanov |
mos: status |
Fix Committed |
Fix Released |
|
2015-09-08 11:51:41 |
Alexander Arzhanov |
tags |
on-verification swift to-be-covered-by-tests |
swift to-be-covered-by-tests |
|
2015-09-09 16:09:17 |
Vitaly Sedelnik |
nominated for series |
|
fuel/6.0-updates |
|
2015-09-09 16:09:17 |
Vitaly Sedelnik |
bug task added |
|
fuel/6.0-updates |
|
2015-09-09 16:09:52 |
Vitaly Sedelnik |
fuel/6.0-updates: status |
New |
Triaged |
|
2015-09-09 16:09:54 |
Vitaly Sedelnik |
fuel/6.0-updates: importance |
Undecided |
High |
|
2015-09-09 16:10:04 |
Vitaly Sedelnik |
fuel/6.0-updates: assignee |
|
MOS Maintenance (mos-maintenance) |
|
2015-09-09 16:10:09 |
Vitaly Sedelnik |
fuel/6.0-updates: milestone |
|
6.0-updates |
|
2015-09-09 16:10:16 |
Vitaly Sedelnik |
fuel/6.0.x: status |
Triaged |
Won't Fix |
|
2015-09-26 10:57:10 |
Vitaly Sedelnik |
fuel/6.0.x: milestone |
6.0.2 |
6.0.1 |
|