Instance fails to boot: Could not access KVM kernel module: Permission denied

Bug #1681461 reported by Mohammed Naser
50
This bug affects 10 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
High
Radosław Piliszek

Bug Description

Could not access KVM kernel module: Permission denied
failed to initialize KVM: Permission denied

It seems like libvirt is unable to access /dev/kvm

Revision history for this message
Mohammed Naser (mnaser) wrote :

This seems to have caused the issue:

https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1637601

The patch for xenial:

https://launchpadlibrarian.net/291206176/xenial_libvirt_uidgid.debdiff

- --with-qemu-group=kvm \
+ --with-qemu-group=libvirt-qemu \

KVM is now built with qemu group `libvirt-qemu` rather than `kvm`. It looks like this change broke our behaviour, so I've looked at the following in the libvirt docs:

https://libvirt.org/drvqemu.html
"Regardless of this build time default, administrators can set a per-host default setting in the /etc/libvirt/qemu.conf configuration file via the user=$USERNAME and group=$GROUPNAME parameters. When a non-root user or group is configured, the libvirt QEMU driver will change uid/gid to match immediately before executing the QEMU binary for a virtual machine."

We don't enforce a user so if the OS one changes, we break. I'll propose a patch which adds it to qemu.conf which should avoid causing issues like this for us.

Changed in kolla:
status: New → Confirmed
assignee: nobody → Mohammed Naser (mnaser)
Revision history for this message
Eduardo Gonzalez (egonzalez90) wrote :

Mohammed, still working in this issue? This is causing Ubuntu KVM to be broken? If so let me know to change Importance to Critical. No other user reported such issue.
Regards

Revision history for this message
Ali Sanhaji (ali-sanhaji) wrote :

I had this exact issue. After few hours of debugging, it seems that the problem wasn't a use/group issue, but rather a nested environment issue.

I run an all-in-one kolla on a VM (let's call it AIO). The host of AIO has the nested variable to Y in kvm_intel module, so /dev/kvm shows up in AIO. But AIO does not have the nested variable to Y. When I turned it to Y:

$ cat /sys/module/kvm_intel/parameters/nested
N
$ sudo rmmod kvm_intel
$ sudo echo 'options kvm_intel nested=1' >> /etc/modprobe.d/kvm.conf"
$ sudo modprobe kvm_intel
$ cat /sys/module/kvm_intel/parameters/nested
Y

The problem "Could not access KVM kernel module: Permission denied" disappeared, and nova could launch virtual machines on AIO.

Revision history for this message
Ali Sanhaji (ali-sanhaji) wrote :

I ran a multinode deployment (ocata 4.0.0) on baremetal, and I ran into this issue again. To solve the problem, I had to change the permissions to /dev/kvm on one of the compute nodes where the VMs are failing to boot:

$ sudo chown root:42427 /dev/kvm
(The permissions before were root:root)

42427 is the group ID of qemu in the nova compute container. The user "nova" is part of this group, and is used by libvirt to launch VMs (check /etc/libvirt/qemu.conf in the nova libvirt container).

Revision history for this message
Gaëtan Trellu (goldyfruit) wrote :

Same issue here. Running this command on the hypervisor solved this issue (not in the container)

$ chown root:42427 /dev/kvm

Revision history for this message
Randy (randymartini) wrote :

There is similar issue on Ubuntu 18.04. /dev/kvm is owned by root:kvm. Doing the chgrp on /dev/kvm only provides a temporary fix. Something in the system reverts the group back to kvm. Plus the kvm group on the hypervisor host is gid:113 and the kvm gid in the nova_libvirt container is gid:106.

Changing /etc/kolla/nova-libvirt/qemu.conf to:
stdio_handler = "file"

user = "nova"
group = "kvm"

I also appended "kvm:x:113:nova" to /etc/group in the nova_libvirt container. Not a pretty fix, but it did work around the issue.

Revision history for this message
Shyam (shyam.biradar) wrote :

Hi Team,

We are also facing the same issue. VM launch is failing. Even if we change /dev/kvm permissions, they are getting automatically get reverted to root:kvm. Fix suggested by Randy may work but not good to change containers on the fly. So I would not go ahead with that.

What is our plan to resolve this issue here? Anyother solution we got on this?

Revision history for this message
Mark Goddard (mgoddard) wrote :

It's odd that only some people are seeing this. Perhaps only some people are using KVM on ubuntu? Surprising though, given that's the default.

We have the following code in extend_start.sh in the nova-libvirt container:

if [[ -c /dev/kvm ]]; then
    chmod 660 /dev/kvm
    chown root:qemu /dev/kvm
fi

And the Dockerfile configures the nova user to be a member of the qemu group.

Which version of OS and kolla/kolla-ansible are you using?

Revision history for this message
Ravinder Kumar (rhcayadav) wrote :

This issue is on both ubuntu 16.04 and 18.04 . Its a critical bug.please give any solution.

Revision history for this message
Ravinder Kumar (rhcayadav) wrote :

please assign this bug to anyone else @Eduardo Gonzalez. It needs to be resolved early

Mark Goddard (mgoddard)
Changed in kolla:
assignee: Mohammed Naser (mnaser) → nobody
importance: Undecided → High
Revision history for this message
Mark Goddard (mgoddard) wrote :

I tried an ubuntu/binary deploy using kolla/kolla-ansible master branch, with virt_type KVM. I was able to boot an instance. I have the following perms on /dev/kvm:

ls /dev/kvm -l
crw-rw---- 1 root 42427 10, 232 Jul 9 13:03 /dev/kvm

Revision history for this message
Mark Goddard (mgoddard) wrote :

I rebooted the openstack all-in-one controller/compute node and after starting the nova instance again it's back up.

Revision history for this message
Mario Torrisi (mtorrisi) wrote :

I'm also running on this issue.
In our case compute node a virtual instance based on Ubuntu 18.04 with nested virtualization enabled.

Even though I set qemu.conf as below

# cat qemu.conf
stdio_handler = "file"

user = "root"
group = "kvm"

max_files = 32768
max_processes = 131072

and following the /dev/kvm permission

# ll /dev/kvm
crw-rw---- 1 root qemu 10, 232 Jul 17 15:40 /dev/kvm

I get this error and no instance can be executed.

2019-07-17 16:19:29.315 6 ERROR nova.compute.manager [req-3f7e8a54-cb86-41c6-8ddf-a0bea03dfd39 7bb006e405c9468b9b3f4244fd20e843 5630599bae6a429a8e11d7abaa1b5afd - default default] [instance: 1931e69e-1415-4eef-b6c1-8da1e6503dd4] Instance failed to spawn: libvirtError: internal error: qemu unexpectedly closed the monitor: 2019-07-17T14:19:28.418374Z qemu-kvm: -chardev pty,id=charserial0,logfile=/var/lib/nova/instances/1931e69e-1415-4eef-b6c1-8da1e6503dd4/console.log,logappend=off: Unable to open logfile /var/lib/nova/instances/1931e69e-1415-4eef-b6c1-8da1e6503dd4/console.log: Permission denied

Any help on that.

Tnx in advance.

Revision history for this message
zibort (zibort) wrote :

In that environment:

kolla_base_distro: "ubuntu"
kolla_install_type: "binary"
openstack_release: "rocky" or openstack_release: "rocky"
kolla-ansible 6.x or 7.x

i use workaround - to customize udev on compute hosts:
# cat /etc/udev/rules.d/60-qemu-system-common.rules
KERNEL=="kvm", GROUP="42427", MODE="0660"

+ reboot

after that group for /dev/kvm is not changed

Revision history for this message
zibort (zibort) wrote :

About my post above OS on compute hosts is Ubuntu 18.04.

Chason Chan (chen-xing)
Changed in kolla:
assignee: nobody → Chason Chan (chen-xing)
Revision history for this message
Chason Chan (chen-xing) wrote :

I would add a doc section to troubleshooting this bug as a tempory fix.

Revision history for this message
Yih Leong Sun (yihleong) wrote :

This also reproducible in OpenStack Rocky on Centos7.6 as host.

Revision history for this message
Viorel-Cosmin Miron (uhl-hosting) wrote :

Its also up to Train reproductible.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

It seems to be reproducible roulette-style. ;-)

That said, I never managed to get it. There must be something particular about the affected hosts.

Chason Chan (chen-xing)
Changed in kolla:
assignee: Chason Chan (chen-xing) → nobody
Revision history for this message
Viorel-Cosmin Miron (uhl-hosting) wrote :

I rebooted and applied a fix above, worked for me.

Revision history for this message
Valdemar Lemche (atterdag) wrote :

I've been running OpenStack Kolla since beginning of April on the same hosts without this problem.

Last week I decided to completely destroy the installation and start over, and now I have this issue.

I'm running ubuntu 18.04.4 amd64, and the qemu-system-common which contains /lib/udev/rules.d/60-qemu-system-common.rules havent been updated since I installed the host afaics.

If I switch to nova_compute_virt_type to "qemu" in /etc/kolla/globals.yml, then nova works fine obviously.

I've tried with both openstack_release "train" and "master"

I install kolla using git, and not from pypi.

Revision history for this message
Valdemar Lemche (atterdag) wrote :

I realized what the problem is.

I just remembed that I only just installed qemu-system-common because I was playing around with getting octavia working, and I installed qemu to build the amphora image on my host.

So that of course created a kvm group on my host with gid 115, whereas its 106 on nova_libvirt container.

... and I couldn't even get octavia to work :/

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

going to triage this

Changed in kolla:
assignee: nobody → Radosław Piliszek (yoctozepto)
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

OK, I triaged this issue.

So, indeed the culprit is the qemu-system-common Ubuntu package installed on the host. It contains the following files dealing with /dev/kvm permissions:
- /lib/udev/rules.d/60-qemu-system-common.rules (this one is the nastiest)
- /lib/systemd/system/qemu-kvm.service (this is one-shot and runs the script below; once just after install due to Ubuntu policy of starting services for user, then on each boot)
- /usr/share/qemu/init/qemu-kvm-init (sets the perms once for the service above)

The systemd service has a very negligible chance of triggering after kolla's nova_libvirt container sets its own permissions (as containerd is much slower to start than this one-shot but it is still possible and obviously right after package installation this is deployment killer).
On the other hand, the udev rule gets triggered always right after the libvirt container changes ownership because libvirt triggers udev to rescan devices...
The proposed workaround involving modifying /lib/udev/rules.d/60-qemu-system-common.rules to contain the desired gid seems sane but at the same time this file is not a configuration file so it might be freely replaced by the package manager (e.g. on upgrade). It would be preferrable to put an override in /etc/udev/rules.d that enforces kolla's gid, e.g. /etc/udev/rules.d/99-kolla-kvm.rules with:
KERNEL=="kvm", GROUP="42427", MODE="0660"
The remaining issue is the systemd service which we can't really control if it gets installed afterwards. We can only really warn people not to install qemu on the host as it is not supported and then it solves both the issues so I am not sure whether there is anything that we should really *fix* on kolla side.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Oh, it seems we can actually tell systemd to mask this service despite it not being installed. So we can workaround both issues in fact. Ugly but works.

Changed in kolla:
status: Confirmed → In Progress
Changed in kolla-ansible:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Radosław Piliszek (yoctozepto)
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

This package seems to have identical structure in both xenial (16.04) and bionic (18.04) so we would handle both this way. The package in focal seems different and lacks the udev rule part (it could be somewhere else now - to check [maybe they changed the default perms... or figured out too much magic at once is bad]).

no longer affects: kolla
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/735441

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Please let me (us) know whether the above fix works for you. That said, it might be a little tricky to backport to something older than Train.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

It seems this patch fixes centos analogically as it too can have an udev rule making /dev/kvm usable by host-level libvirt instead of kolla one.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

On CentOS 7 the file is /lib/udev/rules.d/80-kvm.rules provided by qemu-kvm(-ev) package but it sets permissions to 0666 so that everyone can use /dev/kvm.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/735463

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (master)

Change abandoned by Radosław Piliszek (<email address hidden>) on branch: master
Review: https://review.opendev.org/735463
Reason: and thus the fix is confirmed to work

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/735441
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=202365e70213fe7f23d1d618789e356e24ed0679
Submitter: Zuul
Branch: master

commit 202365e70213fe7f23d1d618789e356e24ed0679
Author: Radosław Piliszek <email address hidden>
Date: Sat Jun 13 21:03:59 2020 +0200

    Make /dev/kvm permissions handling more robust

    This makes use of udev rules to make it smarter and override
    host-level packages settings.
    Additionally, this masks Ubuntu-only service that is another
    pain point in terms of /dev/kvm permissions.
    Fingers crossed for no further surprises.

    Change-Id: I61235b51e2e1325b8a9b4f85bf634f663c7ec3cc
    Closes-bug: #1681461

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/742466

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/742467

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/742471

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/ussuri)

Reviewed: https://review.opendev.org/742466
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=6a6ee3df294f38cbba98f4a34a7ac182e8abce71
Submitter: Zuul
Branch: stable/ussuri

commit 6a6ee3df294f38cbba98f4a34a7ac182e8abce71
Author: Radosław Piliszek <email address hidden>
Date: Sat Jun 13 21:03:59 2020 +0200

    Make /dev/kvm permissions handling more robust

    This makes use of udev rules to make it smarter and override
    host-level packages settings.
    Additionally, this masks Ubuntu-only service that is another
    pain point in terms of /dev/kvm permissions.
    Fingers crossed for no further surprises.

    Change-Id: I61235b51e2e1325b8a9b4f85bf634f663c7ec3cc
    Closes-bug: #1681461
    (cherry picked from commit 202365e70213fe7f23d1d618789e356e24ed0679)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/train)

Reviewed: https://review.opendev.org/742467
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=574257dc2eb5a0fe9031e7aa0bb79ff0851526c6
Submitter: Zuul
Branch: stable/train

commit 574257dc2eb5a0fe9031e7aa0bb79ff0851526c6
Author: Radosław Piliszek <email address hidden>
Date: Sat Jun 13 21:03:59 2020 +0200

    Make /dev/kvm permissions handling more robust

    This makes use of udev rules to make it smarter and override
    host-level packages settings.
    Additionally, this masks Ubuntu-only service that is another
    pain point in terms of /dev/kvm permissions.
    Fingers crossed for no further surprises.

    Change-Id: I61235b51e2e1325b8a9b4f85bf634f663c7ec3cc
    Closes-bug: #1681461
    (cherry picked from commit 202365e70213fe7f23d1d618789e356e24ed0679)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/stein)

Reviewed: https://review.opendev.org/742471
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=b0667666262ccb60efddf66126284e2f24e98ccd
Submitter: Zuul
Branch: stable/stein

commit b0667666262ccb60efddf66126284e2f24e98ccd
Author: Radosław Piliszek <email address hidden>
Date: Sat Jun 13 21:03:59 2020 +0200

    Make /dev/kvm permissions handling more robust

    This makes use of udev rules to make it smarter and override
    host-level packages settings.
    Additionally, this masks Ubuntu-only service that is another
    pain point in terms of /dev/kvm permissions.
    Fingers crossed for no further surprises.

    Change-Id: I61235b51e2e1325b8a9b4f85bf634f663c7ec3cc
    Closes-bug: #1681461
    (cherry picked from commit 202365e70213fe7f23d1d618789e356e24ed0679)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 8.3.0

This issue was fixed in the openstack/kolla-ansible 8.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 10.2.0

This issue was fixed in the openstack/kolla-ansible 10.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 9.3.0

This issue was fixed in the openstack/kolla-ansible 9.3.0 release.

Revision history for this message
ehcpdeveloper (ehcpdeveloper) wrote :

I am using latest kolla-ansible version, (Exact version I cannot see, tried --version or -V, did not help)
I am getting this error on one host. I don't know why in that host.
Other 2 hosts working normal.
I have 1 controller and 2 Nodes now.
I cannot create instance on one of nodes. Same error.

```
  File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/tpool.py", line 83, in tworker
    rv = meth(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/libvirt.py", line 1265, in createWithFlags
    if ret == -1: raise libvirtError (\'virDomainCreateWithFlags() failed\', dom=self)
    libvirt.libvirtError: internal error: process exited while connecting to monitor: Could not access KVM kernel module: Permission denied
2021-03-31T21:16:49.166961Z qemu-system-x86_64: failed to initialize KVM: Permission denied
```

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.