vaultlocker service fails when some interface are DOWN with NO-CARRIER

Bug #1838607 reported by Nicolas Pochet on 2019-08-01
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bionic Backports
Undecided
James Page
vaultlocker
High
Unassigned
vaultlocker (Ubuntu)
High
James Page
Disco
High
James Page
Eoan
High
James Page

Bug Description

[Impact]
Systems with block device encryption managed using vaultlocker will not boot if any interfaces are in a DOWN or NO-CARRIER state

[Test Case]
Deploy OpenStack with block device encryption using vaultlocker (charms)
Unplug or disable a network interface which is configured on the system.
Reboot - server will timeout on unlocking block devices on boot.

[Regression Potential]
Low - change simply removes the dependency on systemd-networkd-wait-online; the vaultlocker units still start after networking.service and the daemon will retry to connect to vault if the require network interface has not yet started.

[Original Bug Report]
On some hosts, it might be possible to have interfaces that are DOWN
with NO-CARRIER. In this case, systemd-networkd-wait-online will timeout
and fail. Therefore vaultlocker will also fail.
If vaultlocker fails it might impact the mount of the encrypted
partitions.

Nicolas Pochet (npochet) wrote :

https://github.com/openstack-charmers/vaultlocker/pull/7 tries to address the issue by removing the dependency on systemd-networkd-wait-online.

Ryan Beisner (1chb1n) wrote :

FYI py34 tests failed on the PR, investigating.

Ryan Beisner (1chb1n) wrote :

FYI, proposed test updates in which will need to be rebased into your change after the test updates merge. https://github.com/openstack-charmers/vaultlocker/pull/8

Nicolas Pochet (npochet) wrote :

The change was rebased and the tests are passing.
Could someone review https://github.com/openstack-charmers/vaultlocker/pull/7 ?

James Page (james-page) on 2019-09-25
Changed in vaultlocker:
status: New → Fix Released
importance: Undecided → High
Nicolas Pochet (npochet) wrote :

Could we please backport it and make it available to Bionic?
I faced that again for another customer deployment.

James Page (james-page) on 2019-12-05
Changed in vaultlocker (Ubuntu Eoan):
assignee: nobody → James Page (james-page)
Changed in vaultlocker (Ubuntu Disco):
assignee: nobody → James Page (james-page)
Changed in vaultlocker (Ubuntu):
assignee: nobody → James Page (james-page)
status: New → Triaged
Changed in vaultlocker (Ubuntu Disco):
status: New → Triaged
Changed in vaultlocker (Ubuntu Eoan):
status: New → Triaged
Changed in vaultlocker (Ubuntu):
importance: Undecided → High
Changed in vaultlocker (Ubuntu Disco):
importance: Undecided → High
Changed in vaultlocker (Ubuntu Eoan):
importance: Undecided → High
Changed in vaultlocker (Ubuntu):
status: Triaged → In Progress
James Page (james-page) on 2019-12-05
Changed in bionic-backports:
status: New → In Progress
assignee: nobody → James Page (james-page)
James Page (james-page) wrote :

I've uploaded a new point release of vaultlocker for consideration for SRU; the point release includes only two changes - a previous fix that was already included as a patch (1.0.3-0ubuntu2) and the fix for this issue.

description: updated
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vaultlocker - 1.0.4-0ubuntu1

---------------
vaultlocker (1.0.4-0ubuntu1) focal; urgency=medium

  * New upstream release (LP: #1838607):
    - d/p/*: Drop all patches, included in release.

 -- James Page <email address hidden> Thu, 05 Dec 2019 14:49:04 +0000

Changed in vaultlocker (Ubuntu):
status: In Progress → Fix Released

Hello Nicolas, or anyone else affected,

Accepted vaultlocker into eoan-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/vaultlocker/1.0.4-0ubuntu0.19.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-eoan to verification-done-eoan. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-eoan. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in vaultlocker (Ubuntu Eoan):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-eoan
Changed in vaultlocker (Ubuntu Disco):
status: Triaged → Fix Committed
tags: added: verification-needed-disco
Brian Murray (brian-murray) wrote :

Hello Nicolas, or anyone else affected,

Accepted vaultlocker into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/vaultlocker/1.0.4-0ubuntu0.19.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Nicolas Pochet (npochet) wrote :

Validation for disco.
I created a disco VM, configured vault on another machine and installed vaultlocker from the repo.
Used vaultlocker to encrypt a partition:

lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 149.7M 1 loop /snap/vault/1822
loop1 7:1 0 54.9M 1 loop /snap/lxd/12631
loop2 7:2 0 89.1M 1 loop /snap/core/8268
sda 8:0 0 20G 0 disk
└─sda1 8:1 0 20G 0 part /
sdb 8:16 0 5G 0 disk
└─sdb1 8:17 0 5G 0 part
  └─crypt-af7376f2-6640-41cb-98d4-2fcaeaa89736 253:0 0 5G 0 crypt /mnt/test

As described in the original bug, there's an interface that is DOWN with NO-CARRIER:
ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:ad:4f:a6 brd ff:ff:ff:ff:ff:ff
3: ens8: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:2b:c5:56 brd ff:ff:ff:ff:ff:ff

When rebooting, we can see the following in the logs:
grep mnt /var/log/syslog
Jan 14 14:17:37 vm1 systemd[1]: Dependency failed for /mnt/test.
Jan 14 14:17:37 vm1 systemd[1]: mnt-test.mount: Job mnt-test.mount/start failed with result 'dependency'.
Jan 14 14:17:42 vm1 systemd[1]: Mounting /mnt/test...
Jan 14 14:17:42 vm1 systemd[1]: Mounted /mnt/test.

The version of vaultlocker is:
dpkg -l | grep vault
ii vaultlocker 1.0.3-0ubuntu2 all Secure storage of dm-crypt keys in Hashicorp Vault

After an upgrade to disco-proposed for the vaultlocker package:
dpkg -l | grep vault
ii vaultlocker 1.0.4-0ubuntu0.19.04.1 all Secure storage of dm-crypt keys in Hashicorp Vault

Rebooting the machine does not show the same errors in the logs:

grep mnt /var/log/syslog
Jan 14 14:24:30 vm1 systemd[983]: mnt-test.mount: Succeeded.
Jan 14 14:27:14 vm1 systemd[1]: Mounting /mnt/test...
Jan 14 14:27:14 vm1 systemd[1]: Mounted /mnt/test.

For the original bug point of view, this patch is fixing the issue in disco-proposed.

tags: added: verification-done-disco
removed: verification-needed-disco
Nicolas Pochet (npochet) wrote :

Validation for eoan.
I created an eoan VM, configured vault on another machine and installed vaultlocker from the repo.
Used vaultlocker to encrypt a partition:

lsblk
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 89.1M 1 loop /snap/core/8268
loop1 7:1 0 54.9M 1 loop /snap/lxd/12631
sda 8:0 0 20G 0 disk
└─sda1 8:1 0 20G 0 part /
sdb 8:16 0 5G 0 disk
└─sdb1 8:17 0 5G 0 part
  └─crypt-127f3167-84fa-430b-a892-59fd704cdb6a 253:0 0 5G 0 crypt /mnt/test

As described in the original bug, there's an interface that is DOWN with NO-CARRIER:
ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:ad:4f:a6 brd ff:ff:ff:ff:ff:ff
3: ens8: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:06:3e:6c brd ff:ff:ff:ff:ff:ff

When rebooting, we can see the following in the logs:
grep mnt /var/log/syslog
Jan 14 15:30:32 vm1 systemd[1]: Dependency failed for /mnt/test.
Jan 14 15:30:32 vm1 systemd[1]: mnt-test.mount: Job mnt-test.mount/start failed with result 'dependency'.

The version of vaultlocker is:
dpkg -l | grep vault
ii vaultlocker 1.0.3-0ubuntu2 all Secure storage of dm-crypt keys in Hashicorp Vault

After an upgrade to disco-proposed for the vaultlocker package:
dpkg -l | grep vault
ii vaultlocker 1.0.4-0ubuntu0.19.10.1 all Secure storage of dm-crypt keys in Hashicorp Vault

Rebooting the machine does not show the same errors in the logs:

grep mnt /var/log/syslog
Jan 14 15:44:04 vm1 systemd[926]: mnt-test.mount: Succeeded.
Jan 14 15:46:46 vm1 systemd[1]: Mounting /mnt/test...
Jan 14 15:46:46 vm1 systemd[1]: Mounted /mnt/test.

For the original bug point of view, this patch is fixing the issue in eoan-proposed.

tags: added: verification-done-eoan
removed: verification-needed-eoan
Nicolas Pochet (npochet) wrote :

James, Brian,

Now that it has been tested and validated that the change works for Disco and Eoan, I guess that we'll have to wait 7 more days to have it in {disco,eoan}-proposed.
What about Bionic? Is it possible to backport it?

The verification of the Stable Release Update for vaultlocker has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vaultlocker - 1.0.4-0ubuntu0.19.10.1

---------------
vaultlocker (1.0.4-0ubuntu0.19.10.1) eoan; urgency=medium

  * New upstream point release including fix for an issue when
    vaultlocker blocks boot if interfaces are in a down or no-carrier
    state (LP: #1838607):
    - d/p/*: Drop, all included in new point release.

 -- James Page <email address hidden> Thu, 05 Dec 2019 16:22:36 +0000

Changed in vaultlocker (Ubuntu Eoan):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vaultlocker - 1.0.4-0ubuntu0.19.04.1

---------------
vaultlocker (1.0.4-0ubuntu0.19.04.1) disco; urgency=medium

  * New upstream point release including fix for an issue when
    vaultlocker blocks boot if interfaces are in a down or no-carrier
    state (LP: #1838607):
    - d/p/*: Drop, all included in new point release.

 -- James Page <email address hidden> Thu, 05 Dec 2019 16:22:58 +0000

Changed in vaultlocker (Ubuntu Disco):
status: Fix Committed → Fix Released
James Page (james-page) wrote :

I've uploaded a backport of 1.0.4-0ubuntu0.19.04.1 to the bionic-backports queue for backports team review

Nicolas Pochet (npochet) wrote :

Validation for Bionic from James Page PPA.
I created a bionic VM, configured vault on another machine and installed vaultlocker from the repo.
Used vaultlocker to encrypt a partition:
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 20G 0 disk
└─sda1 8:1 0 20G 0 part /
sdb 8:16 0 5G 0 disk
└─sdb1 8:17 0 5G 0 part
  └─crypt-be9500f2-b3cd-4027-b7db-435a6bb8cd90 253:0 0 5G 0 crypt /mnt/test

As described in the original bug, there's an interface that is DOWN with NO-CARRIER:
ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:ad:4f:a6 brd ff:ff:ff:ff:ff:ff
3: ens8: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:06:3e:6c brd ff:ff:ff:ff:ff:ff

When rebooting, we can see the following in the logs:
grep mnt /var/log/syslog
Jan 15 11:07:33 vm1 systemd[1]: Dependency failed for /mnt/test.
Jan 15 11:07:33 vm1 systemd[1]: mnt-test.mount: Job mnt-test.mount/start failed with result 'dependency'.

The version of vaultlocker is:
dpkg -l | grep vaultlocker
ii vaultlocker 1.0.3-0ubuntu1.18.10.1~ubuntu18.04.1 all Secure storage of dm-crypt keys in Hashicorp Vault

I upgraded the vaultlocker package from James Page's PPA:
sudo apt-add-repository ppa:james-page/bionic
sudo apt update
sudo apt upgrade

dpkg -l | grep vaultlocker
ii vaultlocker 1.0.4-0ubuntu0.19.04.1~ubuntu18.04.1~ppa1 all Secure storage of dm-crypt keys in Hashicorp Vault

Rebooting the machine does not show the same errors in the logs:

grep mnt /var/log/syslog
Jan 15 11:34:09 vm1 systemd[1]: Mounting /mnt/test...
Jan 15 11:34:09 vm1 systemd[1]: Mounted /mnt/test.

For the original bug point of view, this patch is fixing the issue in the version proposed by James Page in his PPA. This package is in the queue to be backported to bionic-backports.

Edward Hope-Morley (hopem) wrote :

I'm trying to understand why I do not see this issue. I have several interfaces DOWN and vaultlocker does not have this issue on boot:

root@chespin:~# ip a s| grep ": eno"
2: eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
4: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
5: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
6: eno49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master br-eno49 state UP group default qlen 1000
7: eno50: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
(reverse-i-search)`': ^C
root@chespin:~# dpkg -l| grep vaultlocker
ii vaultlocker 1.0.3-0ubuntu1.18.10.1~ubuntu18.04.1 all Secure storage of dm-crypt keys in Hashicorp Vault
root@chespin:~# grep "Dependency failed" /var/log/syslog*
root@chespin:~#

It also appears you are using a vm so i wonder if that somehow impacts your issue. The only other issue with vaultlocker on boot that i am aware of is bug 1804261 where it can timeout reaching the vault api but that is a different problem.

Download full text (5.3 KiB)

Hi Ed,

I looked into this, and the issue only happens if such interfaces (with state DOWN and NO-CARRIER) are managed by systemd-networkd (check with 'networkctl list').

Per systemd-networkd-wait-online.service man page [1]:

'By default, it will wait for all links it is aware of and which are managed by systemd-networkd.service(8) to be fully configured or failed,'

[1] https://www.freedesktop.org/software/systemd/man/systemd-networkd-wait-online.service.html

The issue can be reproduced in a VM with a NIC configured in libvirt XML as "<interface type='ethernet'><link state=down'/></interface>" that is managed by netplan. Steps below.

Hope this helps,
Mauricio

---

Create a VM (bionic):
---

$ uvt-simplestreams-libvirt sync release=bionic arch=amd64

$ uvt-kvm create --cpu 2 --memory 2048 --disk 4 --password password bionic release=bionic arch=amd64
$ uvt-kvm wait bionic

Give it an ethernet interface with link down:
---

$ virsh edit bionic
...
    <interface type='ethernet'>
       <model type='virtio'/>
       <link state='down'/>
    </interface>
...

Re-start the VM:
---

$ virsh shutdown bionic
$ virsh start bionic

$ uvt-kvm wait bionic
$ uvt-kvm ssh bionic

Check that systemd-networkd-wait-online.service is happy with 'ens3' only:
---

By default, only the 'ens3' interface is configured in netplan
(thus managed by systemd-networkd, default renderer in netplan).

$ ip -o l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000\ link/ether 52:54:00:b8:ff:f1 brd ff:ff:ff:ff:ff:ff
3: ens7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\ link/ether 52:54:00:fb:c6:b6 brd ff:ff:ff:ff:ff:ff

$ grep -r ens[0-9]: /etc/netplan
/etc/netplan/50-cloud-init.yaml: ens3:

$ ls -1 /run/systemd/network/*.network
/run/systemd/network/10-netplan-ens3.network

Notice that the 'ens7' interface (other/new) SETUP status is 'unmanaged':

$ sudo networkctl list
IDX LINK TYPE OPERATIONAL SETUP
  1 lo loopback carrier unmanaged
  2 ens3 ether routable configured
  3 ens7 ether off unmanaged

3 links listed.

Thus, despite being 'state DOWN', systemd-networkd-wait-online.service doesn't care, and succeeds:

$ systemctl status systemd-networkd-wait-online.service | grep Process
  Process: 611 ExecStart=/lib/systemd/systemd-networkd-wait-online (code=exited, status=0/SUCCESS)

Check that systemd-networkd-wait-online.service gets unhappy with 'ens7' too:
---

If you just configure 'ens7' in netplan, even without setting/getting IP:

$ cat <<EOF | sudo tee /etc/netplan/ens7.yaml
network:
  ethernets:
    ens7:
      match:
        macaddress: 52:54:00:fb:c6:b6
      set-name: ens7
  version: 2
EOF

$ sudo netplan apply

$ grep -r ens[0-9]: /etc/netplan
/etc/netplan/ens7.yaml: ens7:
/etc/netplan/50-cloud-init.yaml: ens3:

$ ls -1 /run/systemd/network/*.network...

Read more...

James Page (james-page) on 2020-02-12
Changed in bionic-backports:
status: In Progress → Fix Released
Edward Hope-Morley (hopem) wrote :

@mfo thanks yeah, the piece of info that was missing for me is that the interfaces need to be down AND have a netplan configuration in order for the issue to trigger which in my repro was not the case.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers