swift-proxy is showed as running even if it's not working
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
High
|
Emilien Macchi |
Bug Description
While testing a TripleO environment I stop the core resources (Galera, Redis and Rabbit), then I wait 20 minutes and if I don't get problems from the related services from a systemd perspective and from the cluster side (failed actions) I restart those core services. If everything goes fine, then test is passed.
If what I described happens with success I then try to do another test, which is basically to deploy an instance. And while trying to do this, loading an image inside glance, I get this error:
Error finding address for http://
Looking at the image status in glance I get this:
[stack@haa-01 ~]$ glance image-list
+------
| ID | Name | Disk Format | Container Format | Size | Status |
+------
| cf4bb285-
+------
So image is queued and not available. Basically this is related to swift, because after the last test swift is not responding correctly. Commands like **swift stat** hangs.
Basically the problem is around swift-proxy, the status of the service is this one:
● openstack-
Loaded: loaded (/usr/lib/
Active: active (running) since Fri 2016-10-28 03:16:54 UTC; 4h 33min ago
Main PID: 127252 (swift-proxy-ser)
CGroup: /system.
├─127252 /usr/bin/python2 /usr/bin/
└─127476 /usr/bin/python2 /usr/bin/
Oct 28 07:45:25 overcloud-
Oct 28 07:45:57 overcloud-
Oct 28 07:46:11 overcloud-
Oct 28 07:46:43 overcloud-
Oct 28 07:47:15 overcloud-
Oct 28 07:47:47 overcloud-
Oct 28 07:48:13 overcloud-
Oct 28 07:48:45 overcloud-
Oct 28 07:49:07 overcloud-
Oct 28 07:49:39 overcloud-
Hint: Some lines were ellipsized, use -l to show in full.
And this is really wrong, since even if the status of the service shows *running*, the service is not working.
The only way to fix things is to restart the service:
[root@overcloud
Then the status is alright:
[root@overcloud
● openstack-
Loaded: loaded (/usr/lib/
Active: active (running) since Fri 2016-10-28 07:50:44 UTC; 8s ago
Main PID: 6690 (swift-proxy-ser)
CGroup: /system.
├─6690 /usr/bin/python2 /usr/bin/
└─6707 /usr/bin/python2 /usr/bin/
Oct 28 07:50:45 overcloud-
Oct 28 07:50:45 overcloud-
Oct 28 07:50:45 overcloud-
Oct 28 07:50:46 overcloud-
Oct 28 07:50:46 overcloud-
Oct 28 07:50:46 overcloud-
Oct 28 07:50:46 overcloud-
Oct 28 07:50:46 overcloud-
Oct 28 07:50:46 overcloud-
Oct 28 07:50:46 overcloud-
Hint: Some lines were ellipsized, use -l to show in full.
And you are able again to interact with swift:
[root@overcloud
Auth version 1.0 requires ST_AUTH, ST_USER, and ST_KEY environment variables
to be set or overridden with -A, -U, or -K.
Auth version 2.0 requires OS_AUTH_URL, OS_USERNAME, OS_PASSWORD, and
OS_TENANT_NAME OS_TENANT_ID to be set or overridden with --os-auth-url,
--os-username, --os-password, --os-tenant-name or os-tenant-id. Note:
adding "-V 2" is necessary for this.
[root@overcloud
[root@overcloud
Account: AUTH_3a8bec1b90
Containers: 0
Objects: 0
Bytes: 0
X-Put-Timestamp: 1477641134.80437
X-Timestamp: 1477641134.80437
X-Trans-Id: tx78a006105d294
Content-Type: text/plain; charset=utf-8
So this should not happen, since the service must be able to reconnect by itself without any manual intervention.
I've attached all of the overcloud controllers sosreports.
Changed in tripleo: | |
assignee: | nobody → Christian Schwede (cschwede) |
status: | New → In Progress |
Changed in tripleo: | |
importance: | Undecided → High |
milestone: | none → ocata-1 |
tags: |
added: newton-backport-potential removed: glance reconnect swift |
This is most likely related to ceilometermiddl eware, which is placed before the catch_errors middleware and raises these exceptions. Because these errors are not catched properly, the proxy-server can't process the requests anymore.
Related bugreport: https:/ /bugs.launchpad .net/tripleo/ +bug/1637471