Comment 0 for bug 1459772

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote : Disabling management net on a single swift node leads to a very long swift response time

Version: 6.1, ISO #474.
Full version available at http://paste.openstack.org/show/242594/

Steps to reproduce:
1. Install environment with Swift with 3 controllers and 1 compute node
2. Connect to some controller and disable management network here using the following command:
      iptables -I INPUT -i br-mgmt -j DROP && iptables -I OUTPUT -o br-mgmt -j DROP
3. Connect to _another_ controller and execute 10 times command 'swift list' here.

Sometimes the command takes much time - more than a minute. On average, when it happens, response returns in 70 seconds. It might happen every time, or each 2nd or 3rd time, depending on circumstances I do not understand.

Analysis:

The issue occurs when haproxy sends swift request to the firewalled node. The Swift on that node tries to check user's token and times out because it can not connect to Keystone's admin url (which is on management net). It waits for response for 1 minute, and then resends the request to the other node. As a result, request takes slightly more than minute to be processed.

A similar issue would happen with other OpenStack components, but haproxy detects that all services on the node except Swift are dead. Haproxy detects services failure by accessing their endpoint, which listens on management (br-mgmt) network, which is firewalled. Swift's endpoint listens on storage interface (br-storage), so haproxy thinks that Swift is alive on the firewalled node.

In general, the problem is in haproxy health checks beeing too 'weak' - it is not enough to check that service's port is accessible. Probably we need to temporarily disable a service on a node if it constantly fails.

Attached is a snapshot of environment, in which management interface of one node was firewalled (node-2). You can see in haproxy log of node-1 how swift requests were handled. Also, in swift-proxy log of node-2 you can find swift trying to connect to keystone.