ceph-radosgw restart fails
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | ceph (Ubuntu) |
High
|
James Page | ||
| | Trusty |
High
|
Liam Young | ||
| | Vivid |
High
|
James Page | ||
| | Wily |
High
|
James Page | ||
Bug Description
Upstream Bug: http://
[Impact]
On 14.04 the restart target of the sysvinit script brings the service down
but sometimes fails to bring the service back up again. There is a race between stop and start and in the failure case the attempt to bring the service up runs before the service has been stopped and the start command is never issued:
The proposed fix updates /etc/init.d/radosgw so that the stop target
waits for up to 30 seconds for the service to stop cleanly
[Test Case]
Bundle:
openstack-services:
services:
mysql:
branch: lp:~openstack-charmers/charms/trusty/percona-cluster/next
constraints: mem=1G
options:
ceph:
branch: lp:~openstack-charmers/charms/trusty/ceph/next
num_units: 3
constraints: mem=1G
options:
fsid: 6547bd3e-
keystone:
branch: lp:~openstack-charmers/charms/trusty/keystone/next
constraints: mem=1G
options:
ceph-radosgw:
branch: lp:~openstack-charmers/charms/trusty/ceph-radosgw/next
options:
relations:
- [ keystone, mysql ]
- [ ceph-radosgw, keystone ]
- [ ceph-radosgw, ceph ]
# kilo
trusty-kilo:
inherits: openstack-services
series: trusty
overrides:
openstack-
source: cloud:trusty-kilo
trusty-icehouse:
inherits: openstack-services
series: trusty
$ juju-deployer -c next.yaml trusty-icehouse
$ juju ssh ceph-radosgw/0
$ sudo su -
# service radosgw status
/usr/bin/radosgw is running.
# service radosgw restart
Starting client.
/usr/bin/radosgw already running.
/usr/bin/radosgw is running.
# service radosgw status
/usr/bin/radosgw is not running.
# apt-cache policy radosgw
radosgw:
Installed: 0.80.10-
Candidate: 0.80.10-
Version table:
*** 0.80.10-
500 http://
100 /var/lib/
0.79-0ubuntu1 0
500 http://
root@juju-
[Regression Potential]
* The only change in behaviour that would result from this change is that
running the stop target in the init script will wait for up to 30s before
exiting rather than retuning immediatly. I cannot think of any use cases
where this would be an issue.
[Original Bug Report]
job handler:
Jul 22 16:03:44 job-handler-1 ERR Failed to execute job: PUT request for http://
Other logs attached.
Related branches
- James Page: Pending requested 2015-09-07
-
Diff: 395 lines (+320/-1)9 files modified.pc/1cca0c1.patch/src/init-radosgw (+98/-0)
.pc/1cca0c1.patch/src/init-radosgw.sysv (+112/-0)
.pc/applied-patches (+1/-0)
debian/changelog (+7/-0)
debian/patches/1cca0c1.patch (+60/-0)
debian/patches/init-script-stop.patch (+21/-0)
debian/patches/series (+1/-0)
src/init-radosgw (+9/-1)
src/init-radosgw.sysv (+11/-0)
| Andreas Hasenack (ahasenack) wrote : | #1 |
| tags: | removed: kanban |
| Andreas Hasenack (ahasenack) wrote : | #2 |
| Andreas Hasenack (ahasenack) wrote : | #3 |
| summary: |
- Internal server error when uploading to object store (ceph-radosgw) + ceph-radosgw died during deployment |
| information type: | Proprietary → Public |
Changing project to the ceph-radosgw charm
| affects: | landscape → ceph-radosgw (Juju Charms Collection) |
| Nobuto Murata (nobuto) wrote : | #5 |
FWIW, I'm also getting 500 frequently with 'FastCGI: incomplete headers (0 bytes) received from server "/var/www/
I'm using cloud:trusty-kilo.
| Alberto Donato (ack) wrote : | #6 |
I had a similar issue with a ceph/ceph OSA deploy using current stable charms (specifically, cs:trusty/
The autopilot fails while trying to upload simplestreams:
Aug 13 16:02:32 job-handler-1 INFO PUT http://
Last entry in radosgw.log shows the server was stopped:
2015-08-13 15:39:21.500670 7f09d10b47c0 0 ceph version 0.94.2 (5fb85614ca8f35
2015-08-13 15:39:24.407231 7f09d10b47c0 0 framework: civetweb
2015-08-13 15:39:24.407246 7f09d10b47c0 0 framework conf key: port, val: 70
2015-08-13 15:39:24.407270 7f09d10b47c0 0 starting handler: civetweb
2015-08-13 15:39:28.187979 7f09ad7fa700 -1 failed to list objects pool_iterate returned r=-2
2015-08-13 15:39:28.187990 7f09ad7fa700 0 ERROR: lists_keys_next(): ret=-2
2015-08-13 15:39:28.187995 7f09ad7fa700 0 ERROR: sync_all_users() returned ret=-2
2015-08-13 15:40:19.341212 7f09acff9700 1 handle_sigterm
2015-08-13 15:40:19.341248 7f09acff9700 1 handle_sigterm set alarm for 120
2015-08-13 15:40:19.341251 7f09d10b47c0 -1 shutting down
2015-08-13 15:40:19.458224 7f09acff9700 1 handle_sigterm
2015-08-13 15:40:19.458252 7f09acff9700 1 handle_sigterm set alarm for 120
2015-08-13 15:40:20.046138 7f09d10b47c0 1 final shutdown
| Alberto Donato (ack) wrote : | #7 |
| tags: | added: cpec |
| Liam Young (gnuoy) wrote : | #8 |
This is not a charm bug. It looks like an upstart script issue:
# service radosgw status
/usr/bin/radosgw is not running.
# service radosgw start
Starting client.
/usr/bin/radosgw is running.
# service radosgw status
/usr/bin/radosgw is running.
# service radosgw restart
Starting client.
/usr/bin/radosgw already running.
/usr/bin/radosgw is running.
# service radosgw status
/usr/bin/radosgw is not running.
| Changed in ceph-radosgw (Juju Charms Collection): | |
| status: | New → Invalid |
| summary: |
- ceph-radosgw died during deployment + ceph-radosgw restart fails |
| Launchpad Janitor (janitor) wrote : | #9 |
Status changed to 'Confirmed' because the bug affects multiple users.
| Changed in ceph (Ubuntu): | |
| status: | New → Confirmed |
| affects: | ceph-radosgw (Ubuntu) → ceph (Ubuntu) |
| Changed in ceph (Ubuntu): | |
| status: | New → Confirmed |
| description: | updated |
| description: | updated |
| description: | updated |
| Changed in ceph (Ubuntu Wily): | |
| status: | Confirmed → Fix Released |
| Changed in ceph (Ubuntu Trusty): | |
| status: | New → Triaged |
| importance: | Undecided → High |
| Changed in ceph (Ubuntu Wily): | |
| importance: | Undecided → High |
| description: | updated |
| description: | updated |
| description: | updated |
| Changed in ceph (Ubuntu Wily): | |
| status: | Fix Released → Triaged |
| Changed in ceph (Ubuntu Vivid): | |
| status: | New → Triaged |
| importance: | Undecided → High |
| Changed in ceph (Ubuntu Wily): | |
| assignee: | nobody → James Page (james-page) |
| Changed in ceph (Ubuntu Vivid): | |
| assignee: | nobody → James Page (james-page) |
| Changed in ceph (Ubuntu Trusty): | |
| assignee: | nobody → Liam Young (gnuoy) |
| status: | Triaged → In Progress |
| Changed in ceph (Ubuntu Vivid): | |
| status: | Triaged → In Progress |
| Changed in ceph (Ubuntu Wily): | |
| status: | Triaged → In Progress |
| Launchpad Janitor (janitor) wrote : | #10 |
This bug was fixed in the package ceph - 0.94.3-0ubuntu2
---------------
ceph (0.94.3-0ubuntu2) wily; urgency=medium
* d/ceph.install: Drop ceph-deploy manpage from packaging, provided
by ceph-deploy itself (LP: #1475910).
-- James Page <email address hidden> Mon, 07 Sep 2015 14:42:03 +0100
| Changed in ceph (Ubuntu Wily): | |
| status: | In Progress → Fix Released |
| tags: | added: landscape-release-29 |
| Chad Smith (chad.smith) wrote : | #11 |
Will need to confirm once we have a 0.94.3-0ubuntu2 available for deployment
lp:1468335 seems very likely related
| Chris J Arges (arges) wrote : | #12 |
This is blocked in the unapproved queue because bug 1475247 and bug 1477174 have not yet been verified. Please test those bugs first.
Hello Andreas, or anyone else affected,
Accepted ceph into vivid-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
| Changed in ceph (Ubuntu Vivid): | |
| status: | In Progress → Fix Committed |
| tags: | added: verification-needed |
| tags: | added: kanban-cross-team |
| tags: | removed: landscape-release-29 |
| Chris J Arges (arges) wrote : | #14 |
Hello Andreas, or anyone else affected,
Accepted ceph into trusty-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
| Changed in ceph (Ubuntu Trusty): | |
| status: | In Progress → Fix Committed |
| Changed in ceph (Ubuntu Vivid): | |
| status: | Fix Committed → Fix Released |
| Changed in ceph (Ubuntu Vivid): | |
| status: | Fix Released → Fix Committed |
| James Page (james-page) wrote : | #15 |
Tested from trusty proposed - restarts of radosgw are reliable post upgrade.
| tags: |
added: verification-done verification-needed-vivid removed: verification-needed |
| James Page (james-page) wrote : | #16 |
Also verified OK on vivid - restarts under systemd are now consistent.
| tags: | removed: verification-needed-vivid |
| Free Ekanayaka (free.ekanayaka) wrote : | #17 |
@James: is there a plan to upload the fix to the kilo/liberty trusty cloud archive too? That'd be the only way the Landscape Openstack Autopilot could get it I think.
| Launchpad Janitor (janitor) wrote : | #18 |
This bug was fixed in the package ceph - 0.80.10-
---------------
ceph (0.80.10-
* d/p/ceph-
ensure that restarts of the radosgw wait an appropriate amount of time
for the existing daemon to shutdown (LP: #1477225).
ceph (0.80.10-
* Switch to two step 'zapping' of disks, ensuring that disks with invalid
metadata don't cause hangs are fully cleaned and initialized prior
to use (LP: #1475247).
-- Liam Young <email address hidden> Mon, 07 Sep 2015 16:00:31 +0100
| Changed in ceph (Ubuntu Trusty): | |
| status: | Fix Committed → Fix Released |
| Chris J Arges (arges) wrote : Update Released | #19 |
The verification of the Stable Release Update for ceph has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
| Launchpad Janitor (janitor) wrote : | #20 |
This bug was fixed in the package ceph - 0.94.3-
---------------
ceph (0.94.3-
[ James Page ]
* New upstream point release (LP: #1492227).
* d/ceph.install: Drop ceph-deploy manpage from packaging, provided
by ceph-deploy itself (LP: #1475910).
[ Liam Young ]
* d/p/ceph-
ensure that restarts of the radosgw wait an appropriate amount of time
for the existing daemon to shutdown (LP: #1477225).
-- James Page <email address hidden> Mon, 07 Sep 2015 16:01:46 +0100
| Changed in ceph (Ubuntu Vivid): | |
| status: | Fix Committed → Fix Released |
| affects: | ceph-radosgw (Juju Charms Collection) → ubuntu-translations |
| no longer affects: | ubuntu-translations |


ceph-radosgw just died. Last log entries from /var/log/ ceph/radosgw. log:
2015-07-22 15:01:33.303237 7f46bd7fa700 1 handle_sigterm
2015-07-22 15:01:33.396803 7f46e14aa7c0 1 final shutdown
And nothing after that. Landscape got the first error at 15:03:57, and failed continuously until the end.
I logged in on the unit, and there was no radosgw process running. I started one by running the contents of /var/www/s3gw.fcgi: radosgw. gateway
exec /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.
And then it worked.
The object- internal- error.tar. xz has the inner logs in landscape- 0-inner- logs/. You can find the /var/log contents from the ceph-radosgw/0 unit in landscape- 0-inner- logs/ceph- radosgw- 0/var/log/ for example.