Swift-storage dies if rsyslog is stopped

Bug #1683076 reported by Jill Rouleau on 2017-04-15
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack swift-storage charm
Undecided
Unassigned
Ubuntu Cloud Archive
Undecided
Unassigned
Declined for Juno by Corey Bryant
Declined for Liberty by Corey Bryant
Icehouse
Critical
Unassigned
Kilo
Critical
Unassigned
Mitaka
Critical
Unassigned
Newton
Undecided
Unassigned
swift (Ubuntu)
Undecided
Unassigned
Trusty
Critical
Unassigned
Xenial
Critical
Unassigned

Bug Description

Trusty, Mitaka, Juju 1.25.11

We have a cloud where swift replicators are constantly falling over on 2 nodes. This occurs whenever rsyslog restarts, as in

https://bugs.launchpad.net/swift/+bug/1094230
https://review.openstack.org/#/c/24871
https://bugs.python.org/issue15179

rsyslog restarts are unfortunately frequent right now, due to https://bugs.launchpad.net/juju-core/+bug/1683075

Nodes are landscape managed and up to date but still exhibit the failure.

Not much from running swift in verbose. https://pastebin.canonical.com/185609/

sosreports are uploading to https://private-fileshare.canonical.com/~jillr/sf00137831/

[Impact]

 * Stopping rsyslog causes swift daemons to crash due to overflowing the call stack when attempting to write an entry to the logging subsystem and the attempt to write to /dev/log fails. When rsyslog stops, the /dev/log socket is unavailable and results in an exception. The swift logging code attempts to log the resultant error, which again results in an exception. This continues until the stack is overflowed and the swift daemons crash. When the swift daemons crash, the object, container and account data are not able to be replicated to other storage nodes in the system, which affects the data integrity of the data being written to the system.

 * The patch should be backported to stable releases in order to ensure that the data integrity of objects, accounts, and containers within Swift are not adversely affected due to failed logging subsystems.

 * The uploaded patches fix the bug by only attempting to log an entry to the logging subsystem if the current call stack does not include an attempt to write to the logs. If the current call stack includes an attempt to log to the logging subsystem, the log message is dropped avoiding the recursion.

[Test Case]

 * Install swift storage cluster
 * Log into one of the swift storage nodes
 * Ensure the swift-{object,account,container}-replicator processes are running
 * Stop the rsyslog service
 * Wait a minute
 * Observe the swift-{object,account,container}-replicator processes are no longer running

[Regression Potential]

 * This affects the logging capabilities provided by the Swift code. Possible regressions could occur in almost any subsystem, since the logging is universal throughout the code base and could result in lost log entries in the best regression scenario and possible crashing of swift daemons in the worst case scenario. The regression potential is mitigated by the fact that this patch has already been included upstream for over a year now and no regressions have been reported against this code since.

[Other Info]

 * /dev/log is not provided by the rsyslog daemon in Xenial, but this patch still applies in that any persistent exception encountered when writing to /dev/log will cause the call stack to overflow and crash the swift daemons.

Billy Olsen (billy-olsen) wrote :

This appears to be a bug in the Swift logging behavior not in the charm itself.

There's an upstream commit located at [0] which fixes the problem. In a nutshell, the stopping of rsyslogd in an upstart system removes the /dev/log socket, which causes the swift service's attempt to log a message to fail - which it then proceeds to attempt to log, in the syslog. This spirals out and causes the program to crash due to the infinite recursion overflowing the call stack.

[0] https://github.com/openstack/swift/commit/95efd3f9035ec4141e1b182516f040a59a3e5aa6

Changed in charm-swift-storage:
status: New → Invalid
Changed in swift (Ubuntu):
status: New → Invalid
tags: added: sts
Billy Olsen (billy-olsen) wrote :

This patch is for the xenial version of swift.

description: updated
Billy Olsen (billy-olsen) wrote :

This patch is for the trusty-mitaka version of swift for the trusty-mitaka Ubuntu Cloud Archive

Changed in swift (Ubuntu Trusty):
importance: Undecided → Critical
Changed in swift (Ubuntu Xenial):
importance: Undecided → Critical
tags: added: sts-sru-needed
Billy Olsen (billy-olsen) wrote :

This patch is for the kilo version of swift for the trusty-kilo Ubuntu Cloud Archive.

Billy Olsen (billy-olsen) wrote :

This patch is for the icehouse version of swift in Trusty.

tags: added: ubuntu-sponsors
Andy Whitcroft (apw) wrote :

Confirmed this is already fixed in yakkety and later. Reviewed and sponsored for xenial and trusty.

Robie Basak (racb) wrote :

The patches look superfluously different between Trusty and Xenial, and there are no dep3 headers in the Trusty patch. Was this backported, and if so by whom? Are they both separate cherry-picks from upstream, or otherwise what has been done to make the patch for Trusty work?

Billy Olsen (billy-olsen) wrote :

@Robie - sorry the trusty patch was indeed messed up. I've rebuilt and retested the patch with appropriate dep3 headers and such and you'll find it more inline with the other patch versions. The trusty one varies slightly due to the unit test import differences (I tried to keep the same style) and the method signature for logging in the trusty version.

Robie Basak (racb) wrote :

15:07 <rbasak> wolsen: around? You need a sponsor for bug 1683076 I think?

15:08 <rbasak> wolsen: I suggest three minor changes to your trusty debdiff: https://git.launchpad.net/~racb/ubuntu/+source/swift/log/?h=lp1683076-trusty

15:09 <rbasak> The dep3 changes apply to your xenial debdiff too I think. But I'll accept without if you wish.

Can you find a sponsor for the Trusty debdiff please, and then I can accept the SRU?

Billy Olsen (billy-olsen) wrote :

Hey Robie, here's an updated patch for Xenial. I'll update the patch for Trusty next and find an additional sponsor to review it.

Billy Olsen (billy-olsen) wrote :

Updated Origin field to in patch to indicate the patch for xenial is a clean cherry-pick (e.g. upstream instead of backport per DEP-3 guidelines).

Billy Olsen (billy-olsen) wrote :

Here's a patch for trusty with updated DEP-3 headers and white space change log removal. I'll seek an additional sponsor for trusty per:

 <rbasak> wolsen: since you've rewritten the Trusty patch I don't feel that I can accept it myself as the SRU reviewer - it needs another sponsor +1.

Billy Olsen (billy-olsen) wrote :

Updated trusty-kilo patch with corrected DEP-3 headers

Billy Olsen (billy-olsen) wrote :

Updated trusty-mitaka patch with corrected DEP-3 headers

Changed in swift (Ubuntu Trusty):
status: New → Triaged
Changed in swift (Ubuntu Xenial):
status: New → Triaged
Changed in cloud-archive:
status: New → Invalid
status: Invalid → Fix Released
Corey Bryant (corey.bryant) wrote :

Thanks Billy. I've uploaded to trusty (unapproved queue), kilo-staging, and xenial (unapproved queue).

Hello Jill, or anyone else affected,

Accepted swift into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/swift/1.13.1-0ubuntu1.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-trusty to verification-done-trusty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-trusty. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in swift (Ubuntu Trusty):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-trusty
Changed in swift (Ubuntu Xenial):
status: Triaged → Fix Committed
tags: added: verification-needed-xenial
Robie Basak (racb) wrote :

Hello Jill, or anyone else affected,

Accepted swift into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/swift/2.7.1-0ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

James Page (james-page) wrote :

Hello Jill, or anyone else affected,

Accepted swift into mitaka-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:mitaka-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-mitaka-needed to verification-mitaka-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-mitaka-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-mitaka-needed
tags: added: verification-done-xenial
removed: verification-needed-xenial
tags: added: verification-mitaka-done
removed: verification-mitaka-needed
Billy Olsen (billy-olsen) wrote :

I have completed the verification of this bug fix for trusty-proposed and xenial-proposed packages in the Ubuntu project. Tests involve stopping the logging services (e.g. rsyslog) and ensuring that the services continue to run.

I have also completed the verification of this bug fix for the trusty-mitaka cloud archive proposed pocket. I have not seen the update hit the trusty-kilo cloud archive proposed pocket so will follow up with the Ubuntu OpenStack team on that.

tags: added: verification-done-trusty
removed: verification-needed-trusty
James Page (james-page) wrote :

Hello Jill, or anyone else affected,

Accepted swift into kilo-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:kilo-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-kilo-needed to verification-kilo-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-kilo-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-kilo-needed

The verification of the Stable Release Update for swift has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package swift - 2.7.1-0ubuntu2

---------------
swift (2.7.1-0ubuntu2) xenial; urgency=medium

  * Fix issue where swift daemons crash while writing logs to a stopped
    rsyslogd /dev/log socket. (LP: #1683076)
    - d/patches/fix-infinite-recursion-logging.patch: Cherry-picked from
      upstream stable/newton branch to avoid infinite loops when logging
      while rsyslogd is stopped.

 -- Billy Olsen <email address hidden> Mon, 22 May 2017 12:58:01 -0700

Changed in swift (Ubuntu Xenial):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package swift - 1.13.1-0ubuntu1.3

---------------
swift (1.13.1-0ubuntu1.3) trusty; urgency=medium

  * Fix issue where swift daemons crash while writing logs to a stopped
    rsyslogd /dev/log socket. (LP: #1683076)
    - d/patches/fix-infinite-recursion-logging.patch: Cherry-picked from
      upstream stable/newton branch to avoid infinite loops when logging
      while rsyslogd is stopped.

 -- Billy Olsen <email address hidden> Mon, 03 Jul 2017 22:22:58 -0700

Changed in swift (Ubuntu Trusty):
status: Fix Committed → Fix Released
Billy Olsen (billy-olsen) wrote :

I was able to finish verification for trusty-proposed/kilo cloud archive pocket this evening. Stop rsyslogd while swift services are running and verify the {object,account,container}-replicator services do not die due to the inability to write to the logs.

tags: added: verification-kilo-done
removed: verification-kilo-needed
James Page (james-page) wrote :

The verification of the Stable Release Update for swift has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

James Page (james-page) wrote :

This bug was fixed in the package swift - 2.2.2-0ubuntu1.3~cloud1
---------------

 swift (2.2.2-0ubuntu1.3~cloud1) trusty-kilo; urgency=medium
 .
   * Fix issue where swift daemons crash while writing logs to a stopped
     rsyslogd /dev/log socket. (LP: #1683076)
     - d/patches/fix-infinite-recursion-logging.patch: Cherry-picked from
       upstream stable/newton branch to avoid infinite loops when logging
       while rsyslogd is stopped.

James Page (james-page) wrote :

The verification of the Stable Release Update for swift has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

James Page (james-page) wrote :

This bug was fixed in the package swift - 2.7.1-0ubuntu2~cloud0
---------------

 swift (2.7.1-0ubuntu2~cloud0) trusty-mitaka; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 swift (2.7.1-0ubuntu2) xenial; urgency=medium
 .
   * Fix issue where swift daemons crash while writing logs to a stopped
     rsyslogd /dev/log socket. (LP: #1683076)
     - d/patches/fix-infinite-recursion-logging.patch: Cherry-picked from
       upstream stable/newton branch to avoid infinite loops when logging
       while rsyslogd is stopped.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers