"Failed to login to virsh console" error while composing a machine using pod on another node

Bug #1942390 reported by Zia Mulla
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Björn Tillenius

Bug Description

I have a MAAS 2.8 version installed on one of my nodes. Configured it with 2 KVM pods. One pod connects to virsh on the same node where MAAS is hosted. The second KVM pod is hosted on another node (server).
I have add the required SSH keys in MAAS and am able to add a pod but composing a machine fails. If I keep on retrying my attempts, at times it goes through. Not sure if there are any other settings missing.

When I checked the rackd logs, it shows this error:

2021-09-02 04:52:26 provisioningserver.rpc.pods: [critical] node2-pod: Failed to compose machine: RequestedMachine(hostname='subtle-tomcat', architecture='amd64/generic', cores=1, memory=2048, block_devices=[RequestedMachineBlockDevice(size=8000000000, tags=['default'])], interfaces=[RequestedMachineInterface(ifname=None, attach_name=None, attach_type=None, attach_options=None, requested_ips=[], ip_mode=None)], cpu_speed=None, known_host_interfaces=[KnownHostInterface(ifname='ens160', attach_type='macvlan', dhcp_enabled=False)])
 Traceback (most recent call last):
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 500, in errback
     self._startRunCallbacks(fail)
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 567, in _startRunCallbacks
     self._runCallbacks()
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks
     current.result = callback(current.result, *args, **kw)
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1442, in gotResult
     _inlineCallbacks(r, g, deferred)
 --- <exception caught here> ---
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
     result = result.throwExceptionIntoGenerator(g)
   File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
     return g.throw(self.type, self.value, self.tb)
   File "/usr/lib/python3/dist-packages/provisioningserver/drivers/pod/virsh.py", line 1444, in compose
     conn = yield self.get_virsh_connection(context)
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
     result = g.send(result)
   File "/usr/lib/python3/dist-packages/provisioningserver/drivers/pod/virsh.py", line 1399, in get_virsh_connection
     raise VirshError("Failed to login to virsh console.")
 provisioningserver.drivers.pod.virsh.VirshError: Failed to login to virsh console.

MAAS Version: 2.8
Ubuntu version: 18.04
virsh version: 4.0.0

I have a bridge interface (br0) added and attached to the virsh as a network.

Let me know if any other information is required.

Zia Mulla (zmulla)
description: updated
Revision history for this message
Christian Grabowski (cgrabowski) wrote :

Hi there, to make sure I understand, you have 2 host machines, 1 of those hosts is running MAAS and both hosts are running virsh? And the host running MAAS and virsh can connect but MAAS cannot connect to the remote host?

Are there multiple routes between the two host machines? And is it possible there are firewall rules for one or more of those routes but not all?

Changed in maas:
status: New → Incomplete
Revision history for this message
Zia Mulla (zmulla) wrote (last edit ):

I do have multiple routes. On is for the interface connecting to the internet and the other is created as part of addition of a bridge interface for the KVMs. The route table looks like this:
ubuntu@charm-57:~$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.9.158.1 0.0.0.0 UG 100 0 0 ens160
0.0.0.0 10.21.81.254 0.0.0.0 UG 425 0 0 br0
10.9.158.0 0.0.0.0 255.255.255.0 U 100 0 0 ens160
10.21.81.0 0.0.0.0 255.255.255.0 U 425 0 0 br0

Not sure if this could be the cause of the problem. The bridge configurations seem to be inline to what MAAS documentation talks about.

There is no FW between these 2 hosts. The IP addresses on the main interface on these 2 hosts belong to the same subnet.
10.9.158.57 and 10.9.158.58.

Changed in maas:
status: Incomplete → New
Revision history for this message
Alberto Donato (ack) wrote :

Would it be possible to upgrade your maas deployment?
We've made some improvement to how maas handles virsh hosts and improved logging, so upgrading might provide more useful logs.

Changed in maas:
status: New → Incomplete
Revision history for this message
Zia Mulla (zmulla) wrote :

I upgraded my host Ubuntu version to 20.04 and also the MAAS to 3.0 (Version: 1:3.0.0-10029-g.986ea3e45-0ubuntu1~20.04.1). I recreated the entire MAAS system from ground-up after un-installing it.

I still see issues with MAAS being able to communicate with the KVM Pod.
From the logs I see this error:

2021-09-07 12:53:57 provisioningserver.rpc.pods: [critical] Failed to decompose machine.
 Traceback (most recent call last):
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 501, in errback
     self._startRunCallbacks(fail)
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 568, in _startRunCallbacks
     self._runCallbacks()
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
     current.result = callback(current.result, *args, **kw)
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1475, in gotResult
     _inlineCallbacks(r, g, status)
 --- <exception caught here> ---
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
     result = result.throwExceptionIntoGenerator(g)
   File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
     return g.throw(self.type, self.value, self.tb)
   File "/usr/lib/python3/dist-packages/provisioningserver/drivers/pod/virsh.py", line 1463, in decompose
     conn = yield self.get_virsh_connection(context)
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
     result = g.send(result)
   File "/usr/lib/python3/dist-packages/provisioningserver/drivers/pod/virsh.py", line 1405, in get_virsh_connection
     raise VirshError("Failed to login to virsh console.")
 provisioningserver.drivers.pod.virsh.VirshError: Failed to login to virsh console.

After multiple attempts, it does end up creating the machine but commissioning keeps failing with this message: Failed to query node's BMC - Failed to login to virsh console.

Changed in maas:
status: Incomplete → New
Revision history for this message
Alberto Donato (ack) wrote :

Are you using an ssh password or key? Also, do you have a single rack or multiple?

If you have more than one rack and you're using an ssh key, you have to ensure that the private key is available for the maas user on each rack.

Changed in maas:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
Revision history for this message
Bartłomiej Poniecki-Klotz (barteus) wrote :

I have the same issue with MAAS versions 3.4/stable & 3.3/stable

The problem occurs during VM creation process.
The issue is not consistent - I can run same command twice or 3-times until I get VM created. Previous ones result with "Failed to login to virsh console". Additionally, during each redeployment different VMs pose problems.

Here is a log from rackd: https://pastebin.canonical.com/p/8RHvhDZWtr/

Versions:
OS: Ubuntu 22.04
MAAS: 3.4/stable & 3.3/stable (3.3.5-13222-g.78dd996c0)
virsh: 8.0.0

Revision history for this message
Bartłomiej Poniecki-Klotz (barteus) wrote :

subscribed ~field-critical

Revision history for this message
Adam Collard (adam-collard) wrote :

@barteus please respond to the questions asked in #5

Changed in maas:
status: Expired → Incomplete
Revision history for this message
Bartłomiej Poniecki-Klotz (barteus) wrote :

Are you using an ssh password or key?
- we use ssh key

Also, do you have a single rack or multiple?
- 3 rack design

If you have more than one rack and you're using an ssh key, you have to ensure that the private key is available for the maas user on each rack.
- On each of the racks in /var/lib/maas/.ssh/authorized_keys there is the ssh key

Revision history for this message
Bartłomiej Poniecki-Klotz (barteus) wrote :

Environment details:

$ snap list maas
Name Version Rev Tracking Publisher Notes
maas 3.3.5-13222-g.78dd996c0 32636 3.3/stable canonical✓ -

I used both MAAS CLI and UI to check power (easy way to call communication to the virsh console)

Logs:
- maas.log: https://pastebin.canonical.com/p/ZtGPSmbQp6/
- rackd.log: https://pastebin.canonical.com/p/6thxw55mrQ/
- regiond.log: https://pastebin.canonical.com/p/c9shkDFTD5/

Revision history for this message
Björn Tillenius (bjornt) wrote :

I got access to another environment and after adding some debugging, I noticed that virsh is intermittently returning this error:

b'libvirt: error : cannot execute binary /usr/bin/pkttyagent: '

Changed in maas:
assignee: nobody → Björn Tillenius (bjornt)
Revision history for this message
Bartłomiej Poniecki-Klotz (barteus) wrote :

As requested, I moved the discussion to a separate issue because the cause of the new issue is different from the original.

Moved to issue: https://bugs.launchpad.net/maas/+bug/2053033

Revision history for this message
Bartosz Woronicz (mastier1) wrote (last edit ):

I enabled debug: true in rackd and regiond and restarted snap, yet I don't see anything extra there

sudo tail -f /var/snap/maas/common/log/maas.log
...
2024-02-13T17:24:55.021942+00:00 ob88 maas.power: [error] microk8s-2-1: Power state could not be queried: Failed to login to virsh console.
2024-02-13T17:24:55.053085+00:00 ob88 maas.power: [error] microk8s-2-1: Failed to refresh power state: Failed to login to virsh console.

I grep all logs around pkttyagent bit I don't see anything

And I use 3.4.0
installed: 3.4.0-14321-g.1027c7664 (32469) 138MB -

It just happens after reboot sometimes

Libvirt is running. Only the last message is weird.

buntu@ob88:~/work/fce-orangebox88/openstack/yoga$ systemctl status libvirtd
● libvirtd.service - Virtualization daemon
     Loaded: loaded (/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2024-02-13 09:18:22 UTC; 8h ago
TriggeredBy: ● libvirtd-admin.socket
             ● libvirtd.socket
             ● libvirtd-ro.socket
       Docs: man:libvirtd(8)
             https://libvirt.org
   Main PID: 2147 (libvirtd)
      Tasks: 22 (limit: 32768)
     Memory: 48.1M
        CPU: 9.590s
     CGroup: /system.slice/libvirtd.service
             └─2147 /usr/sbin/libvirtd

lut 13 09:18:22 ob88 systemd[1]: Starting Virtualization daemon...
lut 13 09:18:22 ob88 systemd[1]: Started Virtualization daemon.
lut 13 09:18:33 ob88 libvirtd[2888]: 2024-02-13 09:18:33.728+0000: 2888: info : libvirt version: 8.0.0, package: 1ubuntu7.8 (Lena Voytek <email address hidden> Wed, 29 Nov 2023 14:52:52 -0700)
lut 13 09:18:33 ob88 libvirtd[2888]: 2024-02-13 09:18:33.728+0000: 2888: info : hostname: ob88
lut 13 09:18:33 ob88 libvirtd[2888]: 2024-02-13 09:18:33.728+0000: 2888: warning : virSecurityValidateTimestamp:206 : Invalid XATTR timestamp detected on /var/lib/libvirt/qemu/domain-5-juju-1/master-key.aes secdriver=dac

I can login with virsh locally

root@ob88:~# ls -la /var/snap/maas/32469/root/.ssh/id_rsa
-rw------- 1 root root 2602 lut 8 14:44 /var/snap/maas/32469/root/.ssh/id_rsa

root@ob88:~# virsh -c qemu+ssh://maas@172.27.88.1/system
Welcome to virsh, the virtualization interactive terminal.

Type: 'help' for help with commands
       'quit' to quit

virsh #
virsh # list
 Id Name State
-------------------------
 4 vault-1 running
 5 juju-1 running

Revision history for this message
Bartosz Woronicz (mastier1) wrote :
Download full text (3.8 KiB)

Ok, I also see the this issue in audit log

[31673.399529] audit: type=1400 audit(1707847448.090:1952): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/20671/usr/lib/snapd/snap-confine" pid=58982 comm="snap-confine" capability=12 capname="net_admin"
[31673.399847] audit: type=1400 audit(1707847448.090:1953): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/20671/usr/lib/snapd/snap-confine" pid=58982 comm="snap-confine" capability=38 capname="perfmon"
[31699.319202] audit: type=1400 audit(1707847474.010:1954): apparmor="DENIED" operation="open" class="file" profile="snap.maas.supervisor" name="/etc/gss/mech.d/" pid=53485 comm="python3" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[31759.315389] audit: type=1400 audit(1707847534.005:1955): apparmor="DENIED" operation="open" class="file" profile="snap.maas.supervisor" name="/etc/gss/mech.d/" pid=53483 comm="python3" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[31777.639276] audit: type=1400 audit(1707847552.329:1956): apparmor="DENIED" operation="open" class="file" profile="snap.maas.supervisor" name="/etc/gss/mech.d/" pid=53485 comm="python3" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[31778.361988] audit: type=1400 audit(1707847553.049:1957): apparmor="DENIED" operation="open" class="file" profile="snap.maas.supervisor" name="/etc/gss/mech.d/" pid=53483 comm="python3" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[31803.341484] audit: type=1400 audit(1707847578.029:1958): apparmor="DENIED" operation="open" class="file" profile="snap.maas.supervisor" name="/etc/gss/mech.d/" pid=53485 comm="python3" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[31804.394148] audit: type=1400 audit(1707847579.081:1959): apparmor="DENIED" operation="open" class="file" profile="snap.maas.supervisor" name="/etc/gss/mech.d/" pid=53482 comm="python3" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[31805.621061] audit: type=1400 audit(1707847580.309:1960): apparmor="DENIED" operation="open" class="file" profile="snap.maas.supervisor" name="/etc/gss/mech.d/" pid=53483 comm="python3" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[31839.551043] audit: type=1400 audit(1707847614.241:1961): apparmor="DENIED" operation="exec" class="file" profile="snap.maas.supervisor" name="/usr/bin/pkttyagent" pid=59397 comm="virsh" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
[31839.602369] audit: type=1400 audit(1707847614.289:1962): apparmor="DENIED" operation="exec" class="file" profile="snap.maas.supervisor" name="/usr/bin/pkttyagent" pid=59401 comm="virsh" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
[31839.620055] audit: type=1400 audit(1707847614.309:1963): apparmor="DENIED" operation="exec" class="file" profile="snap.maas.supervisor" name="/usr/bin/pkttyagent" pid=59403 comm="virsh" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
[31839.879851] audit: type=1400 audit(1707847614.569:1964): apparmor="DENIED" operation="exec" class="file" profile="snap.maas.supervisor" name="/usr/bin/pkttyagent" pid=59415 comm="virsh" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
[31840.211133] audit: type=1400 audit(1707847614.901:1965): apparmor="DENIED...

Read more...

Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

The new issue was handled in https://bugs.launchpad.net/maas/+bug/2053033.

Closing the original issue again (It was expired before)

Changed in maas:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.