Swift Erasure Code fails with liberasurecode 1.4.0 on CentOS
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| liberasurecode |
Undecided
|
Unassigned | ||
| openstack-ansible |
Low
|
Andy McCrae |
Bug Description
Our Swift gate tests are failing intermittently on CentOS 7 due to "cross policy write" tests - which are essentially testing cross policy as well as Erasure Code (since the second policy is an EC policy in testing) ( Sample gate failure - http://
Manually trying to uploading objects to Swift shows the following:
(swift-untagged) [root@swift-
(swift-untagged) [root@swift-
('Connection aborted.', BadStatusLine(
(swift-untagged) [root@swift-
(swift-untagged) [root@swift-
test.file
The non-ec container upload works fine, whereas the erasure code upload fails.
The version of liberasurecode deployed is:
(swift-untagged) [root@swift-
liberasurecode-
liberasurecode-
Updating to 1.5.0 works though:
[root@swift-
[root@swift-
[root@swift-
[root@swift-
liberasurecode-
liberasurecode-
Now after restarting swift services, the upload succeeds:
(swift-untagged) [root@swift-
test.file
=======
Tested against stable/ocata and Master for Swift.
For reference the CentOS7 kernel being used is:
[root@swift-cent openstack-
3.10.0-
David Moreau Simard (dmsimard) wrote : | #1 |
clayg (clay-gerrard) wrote : | #2 |
The information needed to debug this is in the proxy log lines.
If newer liberasurecode fixes the issue - isn't this bug already "Fix Released"?
Changed in openstack-ansible: | |
status: | New → Incomplete |
status: | Incomplete → New |
clayg (clay-gerrard) wrote : | #3 |
Sorry, I thought this was filed as a libec bug - I don't think I have anything helpful to contribute here - sorry.
Tim Burke (1-tim-z) wrote : | #4 |
It sounds like the proxy worker died trying to service the request, and each time the parent daemon spawned a new one... all the "Removing dead child <pid>" messages like http://
Are there any core dumps that get produced?
What's the config for the EC policy? ec_type / ec_num_
Andy McCrae (andrew-mccrae) wrote : | #5 |
Thanks for the response Tim - I know its not technically a "libec" or swift issue as such, but would be cool to debug it further (I'm pretty sure we ran into a similar situation last cycle)
Here is the section for the ec-tests storage policy:
[storage-policy:1]
name = ec-tests
policy_type = erasure_coding
ec_type = liberasurecode_
ec_num_
ec_num_
ec_object_
A couple things to note, on a "not working" install, I can update to liberasurecode-
I've done a package comparison between a working build from http://
[root@swift-
60d59
< gpg-pubkey-
So I don't think there is an issue with different installed packages.
Here is a coredump (or atleast the first 10 lines from the back trace): http://
I can get more if that'd help! (It's on liberasurecode-
Tim Burke (1-tim-z) wrote : | #6 |
Perfect, that's *exactly* what I needed!
> Program terminated with signal 4, Illegal instruction.
... with the backtrace landing right on a call to ceill -- looks like it matches the problem solved by https:/
I don't think Zaitcev ever made a liberasurecode bug for it, so I think I'll go ahead and associate this bug but mark it "Fix Released".
Changed in liberasurecode: | |
status: | New → Fix Released |
Andy McCrae (andrew-mccrae) wrote : | #7 |
Sweet! Thanks Tim, that should be enough to get a version bump inside of RDO - @dmsimard thoughts? :)
David Moreau Simard (dmsimard) wrote : | #8 |
Just cross referencing the Bugzilla on our end: https:/
I'm sure we'll update it, it's just a matter of time.
Haïkel Guémar (hguemar) wrote : | #9 |
Updates submitted in RDO repos: https:/
Please pay attention as upstream developper told us to be careful with this update, report any issue you'll find asap.
Changed in openstack-ansible: | |
assignee: | nobody → Andy McCrae (andrew-mccrae) |
status: | New → In Progress |
importance: | Undecided → Low |
David Moreau Simard (dmsimard) wrote : | #10 |
We'll be able to update eclib to 1.5.0 in RDO once upstream has bumped upper-constraints to 1.5.0 for Ocata. Tim proposed the bump here: https:/
Readable stack traces: http:// paste.openstack .org/raw/ 616874/ logs.openstack. org/25/ 485225/ 5/check/ gate-openstack- ansible- os_swift- ansible- func-centos- 7/8ad31e6/ logs/ara/ result/ 28186869- 2e52-467a- bbc0-03b3909196 3e/
Full task output: http://