booting should succeed even if vault is unavailable

Bug #1818680 reported by Andrea Ieri
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bionic Backports
Undecided
James Page
OpenStack ceph-osd charm
High
James Page
Ubuntu Cloud Archive
Undecided
Unassigned
Queens
High
James Page
vaultlocker
High
James Page
vaultlocker (Ubuntu)
High
James Page
Cosmic
High
James Page
Disco
High
James Page

Bug Description

[Impact]
decrypt of vaultlocker encrypted block devices blocks the network-online.target; this means that if vault is hosted on the same hardware which is using vaultlocker for encryption at rest, the server will fail to boot fully in the event that all servers are rebooted at the same time.

[Test Case]
Deploy ceph+vaultlocker+vault
Power cycle all servers
Servers never get to multiuser.target as vaultlocker-decrypt services block network-online.target so LXD containers never get started.

[Regression Potential]
The proposed fix drops the Before=network-online.target stanza from the vaultlocker-decrypt systemd unit so minimal impact.

[Original bug report]
If ceph is using vault secrets to encrypt its volumes and vault is not available, booting is not possible without manual intervention, as the ceph-volume and vaultlocker-decrypt services will hang forever.
In case of a full cloud outage, bootstrapping the mysql and vault nodes will require quite a bit of manual intervention, as all required nodes will have to be booted in single user mode to bypass the volume decryption services.

Decryption of the ceph volumes should instead timeout, and allow the rest of the machine to complete the boot sequence.

Andrea Ieri (aieri)
description: updated
description: updated
Revision history for this message
James Page (james-page) wrote :

I incorrectly duped this against bug 1804261 but they are not the same issue.

Changed in charm-ceph-osd:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → James Page (james-page)
status: Confirmed → In Progress
Revision history for this message
James Page (james-page) wrote :

Reproduced by power cycling a converged deployment; the Before=network-online.target stanza in vaultlocker-decrypt causes the boot to block.

Changed in vaultlocker (Ubuntu):
status: New → Triaged
importance: Undecided → High
Changed in charm-ceph-osd:
status: In Progress → Invalid
Revision history for this message
James Page (james-page) wrote :

Marking charm task as invalid as this is a vaultlocker issue.

Changed in vaultlocker (Ubuntu Cosmic):
status: New → Triaged
importance: Undecided → High
Changed in vaultlocker:
status: New → Triaged
importance: Undecided → High
assignee: nobody → James Page (james-page)
status: Triaged → In Progress
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :

Proposed fix appears to have the desired effect.

Changed in vaultlocker:
status: In Progress → Fix Committed
James Page (james-page)
description: updated
Revision history for this message
James Page (james-page) wrote :

Uploaded to disco development and to cosmic UNAPPROVED for SRU team review.

Changed in vaultlocker (Ubuntu Cosmic):
status: Triaged → In Progress
Changed in vaultlocker (Ubuntu Disco):
assignee: nobody → James Page (james-page)
Changed in vaultlocker (Ubuntu Cosmic):
assignee: nobody → James Page (james-page)
Changed in vaultlocker (Ubuntu Disco):
status: Triaged → In Progress
Revision history for this message
James Page (james-page) wrote :

As a temporary workaround:

  juju run --application ceph-osd "sudo sed -i /Before=/d /lib/systemd/system/vaultlocker-decrypt@.service"

will update the systemd unit inline with the proposed SRU and allow a full cloud reboot with vaultlocker in use.

Timely unsealing of vaultlocker is also needed due to bug 1804261

Revision history for this message
James Page (james-page) wrote :

unsealing of vault that is.

Revision history for this message
Andrea Ieri (aieri) wrote :

Thanks James for following up on this. Will the fix trickle down to Bionic and Xenial as well? We have deployments that use vault on both releases.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vaultlocker - 1.0.3-0ubuntu2

---------------
vaultlocker (1.0.3-0ubuntu2) disco; urgency=medium

  * d/p/unblock-network.online.target.patch: Cherry pick fix to ensure
    that vaultlocker-decrypt@ systemd units don't block the network-online
    target which blocks server boot in the event that the Vault deployment
    being used is not accessible (LP: #1818680).

 -- James Page <email address hidden> Thu, 07 Mar 2019 07:44:58 +0000

Changed in vaultlocker (Ubuntu Disco):
status: In Progress → Fix Released
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Andrea, or anyone else affected,

Accepted vaultlocker into cosmic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/vaultlocker/1.0.3-0ubuntu1.18.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-cosmic to verification-done-cosmic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-cosmic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in vaultlocker (Ubuntu Cosmic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-cosmic
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vaultlocker - 1.0.3-0ubuntu1.18.10.1

---------------
vaultlocker (1.0.3-0ubuntu1.18.10.1) cosmic; urgency=medium

  * d/p/unblock-network.online.target.patch: Cherry pick fix to ensure
    that vaultlocker-decrypt@ systemd units don't block the network-online
    target which blocks server boot in the event that the Vault deployment
    being used is not accessible (LP: #1818680).

 -- James Page <email address hidden> Thu, 07 Mar 2019 07:44:58 +0000

Changed in vaultlocker (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Revision history for this message
Alvaro Uria (aluria) wrote :

Would it be possible to backport the fix in Bionic (and maybe xenial, as well, since they're both LTS)? Thank you.

Revision history for this message
James Page (james-page) wrote :

I've uploaded for bionic-backports and raised the associated bug task; with regards to xenial, this will be delivered via the Ubuntu Cloud Archive.

Changed in bionic-backports:
status: New → In Progress
assignee: nobody → James Page (james-page)
Changed in cloud-archive:
status: New → Invalid
James Page (james-page)
Changed in bionic-backports:
status: In Progress → Fix Released
Ryan Beisner (1chb1n)
tags: added: uosci
James Page (james-page)
Changed in vaultlocker:
status: Fix Committed → Fix Released
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello Andrea, or anyone else affected,

Accepted vaultlocker into queens-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:queens-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-queens-needed to verification-queens-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-queens-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-queens-needed
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

I have validated that using proposed allows the machines to boot even when Vault is sealed by rebooting all of the instances in a deployment using vault + ceph-osd with encryption.

tags: added: verification-queens-done
removed: verification-queens-needed
Revision history for this message
Corey Bryant (corey.bryant) wrote : Update Released

The verification of the Stable Release Update for vaultlocker has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package vaultlocker - 1.0.4-0ubuntu0.19.04.1~ubuntu18.04.1~cloud0
---------------

 vaultlocker (1.0.4-0ubuntu0.19.04.1~ubuntu18.04.1~cloud0) xenial-queens; urgency=medium
 .
   * New upstream release for the Ubuntu Cloud Archive.
 .
 vaultlocker (1.0.4-0ubuntu0.19.04.1~ubuntu18.04.1) bionic-backports; urgency=medium
 .
   * No-change backport to bionic
 .
 vaultlocker (1.0.4-0ubuntu0.19.04.1) disco; urgency=medium
 .
   * New upstream point release including fix for an issue when
     vaultlocker blocks boot if interfaces are in a down or no-carrier
     state (LP: #1838607):
     - d/p/*: Drop, all included in new point release.
 .
 vaultlocker (1.0.3-0ubuntu2) disco; urgency=medium
 .
   * d/p/unblock-network.online.target.patch: Cherry pick fix to ensure
     that vaultlocker-decrypt@ systemd units don't block the network-online
     target which blocks server boot in the event that the Vault deployment
     being used is not accessible (LP: #1818680).
 .
 vaultlocker (1.0.3-0ubuntu1) cosmic; urgency=medium
 .
   * Initial release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers