Swift race condition on AIO

Bug #1274358 reported by Mark T. Voelker on 2014-01-30
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cisco Openstack
Medium
Chris Ricker

Bug Description

A few folks have reported seeing this on h.1 (though it appears to be inconsistent, suggesting a race condition that only sometimes manifests). It appears to become more prevalent with newer updates from StackForge pulled in after h.1. If this condition is encountered, the requite services can generally be restarted manually (or a second puppet run performed) to rectify the condition ("service swift-container-replicator restart;service swift-container-sync restart;service swift-account-replicator restart"):

Error: Could not start Service[swift-container-replicator]: Execution of '/sbin/start swift-container-replicator' returned 1:

Error: /Stage[main]/Swift::Storage::Container/Swift::Storage::Generic[container]/Service[swift-container-replicator]/ensure: change from stopped to running failed: Could not start Service[swift-container-replicator]: Execution of '/sbin/start swift-container-replicator' returned 1:

Error: Could not start Service[swift-container-sync]: Execution of '/sbin/start swift-container-sync' returned 1:

Error: /Stage[main]/Swift::Storage::Container/Service[swift-container-sync]/ensure: change from stopped to running failed: Could not start Service[swift-container-sync]: Execution of '/sbin/start swift-container-sync' returned 1:

Error: Could not start Service[swift-account-replicator]: Execution of '/sbin/start swift-account-replicator' returned 1:

Error: /Stage[main]/Swift::Storage::Account/Swift::Storage::Generic[account]/Service[swift-account-replicator]/ensure: change from stopped to running failed: Could not start Service[swift-account-replicator]: Execution of '/sbin/start swift-account-replicator' returned 1:

The root cause appears to be a race condition in which services may be started before the ringsync is complete, as evidenced by the messages like the following in the upstart logs for the failed services:

IOError: [Errno 2] No such file or directory: '/etc/swift/account.ring.gz'

Changed in openstack-cisco:
assignee: Chip (cbaesema) → Chris Ricker (chris-ricker)
importance: High → Medium
Mark T. Voelker (mvoelker) wrote :

Looks like we won't have time for this in h.2, but there is a simple workaround (either do a second puppet run or restart the three services manually) and therefore this isn't a showstopper.

Changed in openstack-cisco:
milestone: h.2 → i.0
Shweta P (shweta-ap05) wrote :

I see this error in the full HA(swift) setup as well. Do not see the ring.gz in the swift storage nodes

Changed in openstack-cisco:
status: In Progress → New
status: New → Confirmed
Chris Ricker (chris-ricker) wrote :
Changed in openstack-cisco:
status: Confirmed → In Progress
Changed in openstack-cisco:
status: In Progress → Fix Committed
Changed in openstack-cisco:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers