Killing one RabbitMQ node causes complete Swift outage

Bug #1560055 reported by Dmitry Mescheryakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Kyrylo Galanov
8.0.x
Won't Fix
High
Kyrylo Galanov
Mitaka
Fix Released
High
Kyrylo Galanov

Bug Description

Version: 8.0

1. Install environment with Ceilometer and Swift consisting of 1 controller node, 1 compute node and 3 rabbitmq nodes (using https://github.com/openstack/fuel-plugin-detach-rabbitmq plugin)
2. Log into controller and examine /etc/swift/proxy-server.conf there, find the following line
    url = rabbit://nova:SaS7ZhUNQELx19vsBZz4krZS@192.168.0.6:5673,192.168.0.7:5673,192.168.0.4:5673//

3. Find out, which RabbitMQ node has the first IP in the list and power it off.
4. Try running
    swift --debug list --os-auth-url http://<VIP>:35357/v2.0

Expected result
    The command succeeds

Actual result
    The command hangs until HA proxy returns 504

The problem is that credentials need to be passed to each server separately, like that:
url = rabbit://nova:SaS7ZhUNQELx19vsBZz4krZS@192.168.0.6:5673,nova:SaS7ZhUNQELx19vsBZz4krZS@192.168.0.7:5673,nova:SaS7ZhUNQELx19vsBZz4krZS@192.168.0.4:5673//

Otherwise, Swift goes to 2nd and 3rd RabbitMQ node with default guest/guest credentials.

9.0 seems to be affected as well, as it has the same parameter in proxy-server.conf. Most probably issue could be reproduced without detach-rabbitmq plugin if you install 3 controllers and shut down 2 of 3 proxy-server processes.

Dmitry Klenov (dklenov)
tags: added: area-library
tags: added: swift team-bugfix
tags: added: low-hanging-fruit
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

This is not valid. Dmitry had invalid network settings on the added node.

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Matt, you confused that bug with another case I have asked you to help me with :-) . That bug is very valid and should be reproducible without the plugin. Here Swift simply has incorrect oslo.messaging settings - credentials are provided for the first node only.

Revision history for this message
Kyrylo Galanov (kgalanov) wrote :
Dmitry Pyzhov (dpyzhov)
tags: added: area-mos
removed: area-library
tags: added: on-verification
Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

I have question: how much controllers should be?
" 1 controller node" or "3. Find out, which controller has the first IP"
I have deployed 1 Controller, 1 Compute, 3 Rabbit-MQ nodes. Is it enough?

Check:

1.
root@node-1:~# cat /etc/swift/proxy-server.conf | grep url
pipeline = catch_errors crossdomain healthcheck cache bulk tempurl ratelimit formpost swift3 s3token authtoken keystone staticweb container_quotas account_quotas slo proxy-server
[filter:tempurl]
use = egg:swift#tempurl

It seems that we have not "url = rabbit://nova:..." in proxy-server.conf and check failed.

2.
root@node-1:~# swift --debug list --os-auth-url http://192.168.0.2:35357/v2.0
DEBUG:keystoneclient.auth.identity.v2:Making authentication request to http://192.168.0.2:35357/v2.0/tokens
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): 192.168.0.2
DEBUG:requests.packages.urllib3.connectionpool:"POST /v2.0/tokens HTTP/1.1" 200 5168
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): 192.168.0.2
DEBUG:requests.packages.urllib3.connectionpool:"GET /v1/AUTH_3b5fada88a584fa4b18b0de95b2eb7a0?format=json HTTP/1.1" 200 2
DEBUG:swiftclient:REQ: curl -i http://192.168.0.2:8080/v1/AUTH_3b5fada88a584fa4b18b0de95b2eb7a0?format=json -X GET -H "X-Auth-Token: gAAAAABXTVgPcOIWFH-fnZuPfgShn9fSzl60Zw0qkc32FyhnLsrD0FXNbeDbHWFQCNspYsTuePJtQ3_WrRkR_oQjEk7qRLHRW9hZ72gzIGs1-5v5WkIYmsUjNx-D56lDWJWJS5Za2pRwfsIxkZg3kIeIIOZd0Da6csIYG_v9IteJpyVtfpMeBoA"
DEBUG:swiftclient:RESP STATUS: 200 OK
DEBUG:swiftclient:RESP HEADERS: {u'Content-Length': u'2', u'X-Put-Timestamp': u'1464686607.83794', u'X-Account-Object-Count': u'0', u'Connection': u'close', u'X-Timestamp': u'1464686607.83794', u'X-Trans-Id': u'txe62c62a1c9fd45acb2290-00574d580f', u'Date': u'Tue, 31 May 2016 09:23:27 GMT', u'X-Account-Bytes-Used': u'0', u'X-Account-Container-Count': u'0', u'Content-Type': u'application/json; charset=utf-8'}
DEBUG:swiftclient:RESP BODY: []

This check looks good.

# shotgun2 short-report
cat /etc/fuel_build_id:
 416
cat /etc/fuel_build_number:
 416
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6347.noarch
 fuel-bootstrap-cli-9.0.0-1.mos284.noarch
 fuel-migrate-9.0.0-1.mos8398.noarch
 rubygem-astute-9.0.0-1.mos746.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8709.noarch
 network-checker-9.0.0-1.mos72.x86_64
 fuel-mirror-9.0.0-1.mos137.noarch
 fuel-openstack-metadata-9.0.0-1.mos8709.noarch
 fuel-notify-9.0.0-1.mos8398.noarch
 nailgun-mcagents-9.0.0-1.mos746.noarch
 python-fuelclient-9.0.0-1.mos316.noarch
 fuelmenu-9.0.0-1.mos270.noarch
 fuel-9.0.0-1.mos6347.noarch
 fuel-utils-9.0.0-1.mos8398.noarch
 fuel-setup-9.0.0-1.mos6347.noarch
 fuel-library9.0-9.0.0-1.mos8398.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-agent-9.0.0-1.mos284.noarch
 fuel-ui-9.0.0-1.mos2706.noarch
 fuel-ostf-9.0.0-1.mos934.noarch
 fuel-misc-9.0.0-1.mos8398.noarch
 python-packetary-9.0.0-1.mos137.noarch
 fuel-nailgun-9.0.0-1.mos8709.noarch

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Kyrylo, that is a good catch. Here phrase "3. Find out, which controller has the first IP in the list" actually should be read as "3. Find out, which RabbitMQ node has the first IP in the list". I will correct the description accordingly.

Regarding the fact that you did not find the 'url' parameter in proxy-server.conf, did you install Ceilometer on the environment? It is essential for the reproduction as otherwise Swift is not configured to send notifications.

description: updated
Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Thanks for details!
Verification passed on ISO 416.

Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/puppet-swift 8.1.0

This issue was fixed in the openstack/puppet-swift 8.1.0 release.

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/puppet-swift 9.0.0

This issue was fixed in the openstack/puppet-swift 9.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.