booting should succeed even if vault is unavailable

Bug #1818680 reported by Andrea Ieri on 2019-03-05
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack ceph-osd charm
High
James Page
vaultlocker
High
James Page
vaultlocker (Ubuntu)
High
James Page
Cosmic
High
James Page
Disco
High
James Page

Bug Description

[Impact]
decrypt of vaultlocker encrypted block devices blocks the network-online.target; this means that if vault is hosted on the same hardware which is using vaultlocker for encryption at rest, the server will fail to boot fully in the event that all servers are rebooted at the same time.

[Test Case]
Deploy ceph+vaultlocker+vault
Power cycle all servers
Servers never get to multiuser.target as vaultlocker-decrypt services block network-online.target so LXD containers never get started.

[Regression Potential]
The proposed fix drops the Before=network-online.target stanza from the vaultlocker-decrypt systemd unit so minimal impact.

[Original bug report]
If ceph is using vault secrets to encrypt its volumes and vault is not available, booting is not possible without manual intervention, as the ceph-volume and vaultlocker-decrypt services will hang forever.
In case of a full cloud outage, bootstrapping the mysql and vault nodes will require quite a bit of manual intervention, as all required nodes will have to be booted in single user mode to bypass the volume decryption services.

Decryption of the ceph volumes should instead timeout, and allow the rest of the machine to complete the boot sequence.

Andrea Ieri (aieri) on 2019-03-05
description: updated
description: updated
James Page (james-page) wrote :

I incorrectly duped this against bug 1804261 but they are not the same issue.

Changed in charm-ceph-osd:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → James Page (james-page)
status: Confirmed → In Progress
James Page (james-page) wrote :

Reproduced by power cycling a converged deployment; the Before=network-online.target stanza in vaultlocker-decrypt causes the boot to block.

Changed in vaultlocker (Ubuntu):
status: New → Triaged
importance: Undecided → High
Changed in charm-ceph-osd:
status: In Progress → Invalid
James Page (james-page) wrote :

Marking charm task as invalid as this is a vaultlocker issue.

Changed in vaultlocker (Ubuntu Cosmic):
status: New → Triaged
importance: Undecided → High
Changed in vaultlocker:
status: New → Triaged
importance: Undecided → High
assignee: nobody → James Page (james-page)
status: Triaged → In Progress
James Page (james-page) wrote :

Proposed fix appears to have the desired effect.

Changed in vaultlocker:
status: In Progress → Fix Committed
James Page (james-page) on 2019-03-07
description: updated
James Page (james-page) wrote :

Uploaded to disco development and to cosmic UNAPPROVED for SRU team review.

Changed in vaultlocker (Ubuntu Cosmic):
status: Triaged → In Progress
Changed in vaultlocker (Ubuntu Disco):
assignee: nobody → James Page (james-page)
Changed in vaultlocker (Ubuntu Cosmic):
assignee: nobody → James Page (james-page)
Changed in vaultlocker (Ubuntu Disco):
status: Triaged → In Progress
James Page (james-page) wrote :

As a temporary workaround:

  juju run --application ceph-osd "sudo sed -i /Before=/d /lib/systemd/system/vaultlocker-decrypt@.service"

will update the systemd unit inline with the proposed SRU and allow a full cloud reboot with vaultlocker in use.

Timely unsealing of vaultlocker is also needed due to bug 1804261

James Page (james-page) wrote :

unsealing of vault that is.

Andrea Ieri (aieri) wrote :

Thanks James for following up on this. Will the fix trickle down to Bionic and Xenial as well? We have deployments that use vault on both releases.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vaultlocker - 1.0.3-0ubuntu2

---------------
vaultlocker (1.0.3-0ubuntu2) disco; urgency=medium

  * d/p/unblock-network.online.target.patch: Cherry pick fix to ensure
    that vaultlocker-decrypt@ systemd units don't block the network-online
    target which blocks server boot in the event that the Vault deployment
    being used is not accessible (LP: #1818680).

 -- James Page <email address hidden> Thu, 07 Mar 2019 07:44:58 +0000

Changed in vaultlocker (Ubuntu Disco):
status: In Progress → Fix Released

Hello Andrea, or anyone else affected,

Accepted vaultlocker into cosmic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/vaultlocker/1.0.3-0ubuntu1.18.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-cosmic to verification-done-cosmic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-cosmic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in vaultlocker (Ubuntu Cosmic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-cosmic
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vaultlocker - 1.0.3-0ubuntu1.18.10.1

---------------
vaultlocker (1.0.3-0ubuntu1.18.10.1) cosmic; urgency=medium

  * d/p/unblock-network.online.target.patch: Cherry pick fix to ensure
    that vaultlocker-decrypt@ systemd units don't block the network-online
    target which blocks server boot in the event that the Vault deployment
    being used is not accessible (LP: #1818680).

 -- James Page <email address hidden> Thu, 07 Mar 2019 07:44:58 +0000

Changed in vaultlocker (Ubuntu Cosmic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers