Live migraion is failing in tls-everywhere scenario for existing instances with UseTLSTransportForNbd: True

Bug #1900986 reported by Martin Schuppert
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Martin Schuppert

Bug Description

live migration in tls-everywhere scenario fails with:

2020-09-17 15:44:09.067 8 ERROR nova.virt.libvirt.driver [req-b515c1c4-f4e0-4a0d-a240-19b2b96e71a9 3452fb519c0e4113afbb5e7da113cdbe e46388fad64144cf9417a2a7f34d4c75 - default default] [instance: aff4e47c-ab7d-4473-af6c-fe5c646335f3] Migration operation has aborted
2020-09-17 15:44:09.338 8 ERROR nova.virt.libvirt.driver [req-b515c1c4-f4e0-4a0d-a240-19b2b96e71a9 3452fb519c0e4113afbb5e7da113cdbe e46388fad64144cf9417a2a7f34d4c75 - default default] [instance: aff4e47c-ab7d-4473-af6c-fe5c646335f3] Live Migration failure: internal error: unable to execute QEMU command 'object-add': Unable to access credentials /etc/pki/qemu/ca-cert.pem: No such file or directory: libvirtError: internal error: unable to execute QEMU command 'object-add': Unable to access credentials /etc/pki/qemu/ca-cert.pem: No such file or directory

The issue is that the certificates for the tls nbd block migration get created during the update.
They did not exist in the libvirtd container when the existing instances were created. During
libvirt container create the certificates get merged into the container directory tree using the
kolla_config mechanism. They are not a direct bind mount from the host. Therefor the qemu
processes of the existing instances don't have that information and the nbd setup process
fails with the seen error, which we can also confirm when strace a qemu process of an instance
created before the update during a live migrate:

116406 stat("/etc/pki/qemu/ca-cert.pem", 0x7fff6f4ec390) = -1 ENOENT (No such file or directory)
116406 sendmsg(25, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="{\"id\": \"libvirt-2611\", \"error\": {\"class\": \"GenericError\", \"desc\": \"Unable to access credentials /etc/pki/qemu/ca-cert.pem: No such file or directory\"}}\r\n", iov_len=153}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 153

The immediate solution is to run an overcloud deploy and specify not to use TLS transport for
nbd, which configures the same configuration as before the minor update:

parameter_defaults:
    UseTLSTransportForNbd: False

For a transition to use UseTLSTransportForNbd: true, we need enhance the THT to support the
following transition path:
1) create the required nbd certificates also with "UseTLSTransportForNbd: False", or use bind
mounts for the cert directories instead of merging them into the directory tree on container
create. This would also have the benefit that there is no action required when the nbd certs
change.
2) all instances need to be migrated once that qemu process runs with an environment which has
all the certificate information
3) enable "UseTLSTransportForNbd: True" for the overcloud deployment

After that all instances have the required information to do live migration with
"UseTLSTransportForNbd: True".

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/759232

Changed in tripleo:
assignee: nobody → Martin Schuppert (mschuppert)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/760524

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-tripleoclient (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/760535

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-tripleoclient (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/760789

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on python-tripleoclient (master)

Change abandoned by Martin Schuppert (<email address hidden>) on branch: master
Review: https://review.opendev.org/760535
Reason: abandon in favor of https://review.opendev.org/#/c/760789/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (master)

Reviewed: https://review.opendev.org/760524
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=e3a3d3daf4f3a2d4b818129e93e0913346e295ea
Submitter: Zuul
Branch: master

commit e3a3d3daf4f3a2d4b818129e93e0913346e295ea
Author: Martin Schuppert <email address hidden>
Date: Fri Oct 30 10:29:18 2020 +0100

    Make sure qemu CA has correct permissions

    Make sure the qemu ca has correct permissiones 0644 to be bind
    mountend into the libvirt container.

    Related-Bug: #1900986
    Related: https://bugzilla.redhat.com/show_bug.cgi?id=1888951

    Change-Id: I9538b7e579d4921b14f6ef5eec0300e7e50628d4

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/760522
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=e07e571ba205097a23d9dc4e1fb9a6645351c248
Submitter: Zuul
Branch: master

commit e07e571ba205097a23d9dc4e1fb9a6645351c248
Author: Martin Schuppert <email address hidden>
Date: Fri Oct 30 10:43:58 2020 +0100

    Use bind mounts for tls certificates

    Certificates get merged into the containers using kolla_config
    mechanism. If a certificate changes, or e.g. UseTLSTransportForNbd
    gets disabled and enabled at a later point the containers running
    the qemu process miss the required certificates and live migration
    fails.
    This change moves to use bind mount for the certificates and in
    case of UseTLSTransportForNbd ans creates the required certificates even
    if UseTLSTransportForNbd is set to False. With this UseTLSTransportForNbd
    can be enabled/disabled as the required bind mounts/certificates
    are already present.

    Related-Bug: #1900986
    Related: https://bugzilla.redhat.com/show_bug.cgi?id=1888951

    Depends-On: I9538b7e579d4921b14f6ef5eec0300e7e50628d4

    Change-Id: I7f583d18e558b95922a66eb539cc91de74409c96

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/761837

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/761838

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (stable/victoria)

Reviewed: https://review.opendev.org/761837
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=8e3eac8ff752e35b04dae9ca104eca7d9409257a
Submitter: Zuul
Branch: stable/victoria

commit 8e3eac8ff752e35b04dae9ca104eca7d9409257a
Author: Martin Schuppert <email address hidden>
Date: Fri Oct 30 10:29:18 2020 +0100

    Make sure qemu CA has correct permissions

    Make sure the qemu ca has correct permissiones 0644 to be bind
    mountend into the libvirt container.

    Related-Bug: #1900986
    Related: https://bugzilla.redhat.com/show_bug.cgi?id=1888951

    Change-Id: I9538b7e579d4921b14f6ef5eec0300e7e50628d4
    (cherry picked from commit e3a3d3daf4f3a2d4b818129e93e0913346e295ea)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/victoria)

Reviewed: https://review.opendev.org/761838
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6e59a84ec0082b17b54acbc0374700217a3f684b
Submitter: Zuul
Branch: stable/victoria

commit 6e59a84ec0082b17b54acbc0374700217a3f684b
Author: Martin Schuppert <email address hidden>
Date: Fri Oct 30 10:43:58 2020 +0100

    Use bind mounts for tls certificates

    Certificates get merged into the containers using kolla_config
    mechanism. If a certificate changes, or e.g. UseTLSTransportForNbd
    gets disabled and enabled at a later point the containers running
    the qemu process miss the required certificates and live migration
    fails.
    This change moves to use bind mount for the certificates and in
    case of UseTLSTransportForNbd ans creates the required certificates even
    if UseTLSTransportForNbd is set to False. With this UseTLSTransportForNbd
    can be enabled/disabled as the required bind mounts/certificates
    are already present.

    Related-Bug: #1900986
    Related: https://bugzilla.redhat.com/show_bug.cgi?id=1888951

    Depends-On: I9538b7e579d4921b14f6ef5eec0300e7e50628d4

    Change-Id: I7f583d18e558b95922a66eb539cc91de74409c96
    (cherry picked from commit e07e571ba205097a23d9dc4e1fb9a6645351c248)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/762305

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/762306

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/762497

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (master)

Reviewed: https://review.opendev.org/762497
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=222c67e128620cfd4cc0d65ddd15d73991d486d6
Submitter: Zuul
Branch: master

commit 222c67e128620cfd4cc0d65ddd15d73991d486d6
Author: Martin Schuppert <email address hidden>
Date: Thu Nov 12 11:44:55 2020 +0100

    Change qemu user id to match previous releases

    The qemu user on the host gets created using uid/gid 107. Certificates
    on the host, but also the vhost-user sockets created by ovs use this
    uid/gid. With the move to TCIB images the default kolla id were
    reverted and the previous overwrite dropped. This make e.g. the qemu
    processes to fail to use the libvirt-vnc bind mounted certificates.
    This change brings back the previous overwrite of the qemu user
    uid/gid.

    Closes-Bug: #1903508
    Related-Bug: #1900986

    Change-Id: I54b9d9f341b521b415a6dccc6c78ae7a77821f6f

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/763185

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

There are many others places where certificates are copied with kolla configs. Shall we change them all to start using bind-mounts?

Changed in tripleo:
importance: Undecided → High
milestone: none → wallaby-rc1
Changed in tripleo:
milestone: wallaby-rc1 → xena-1
Changed in tripleo:
milestone: xena-1 → xena-2
Changed in tripleo:
milestone: xena-2 → xena-3
Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.