maas could not deploy a kvm-based vm via virsh

Bug #1895922 reported by Taihsiang Ho
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
curtin
Invalid
Undecided
Unassigned
uvtool
Incomplete
Undecided
Unassigned
curtin (Ubuntu)
New
Undecided
Unassigned
maas (Ubuntu)
New
Undecided
Unassigned

Bug Description

Curtin process failed during MaaS deployment. The infrastructure worked well until ~3 weeks ago.

[Steps to Reproduce]
1. Enlist and commission a KVM-based VM in MaaS 2.8.2 (8577-g.a3e674063)
2. Deploy the VM node

- Reproducing environment: nodes named after "scalebot-dev-controller" prefix in tremont lab maas.
- Those vm were created (via virsh command), enlisted (via maas web UI), and commissioned (via maas web UI) manually. They are not created via MaaS pod controlling flow.

[Expected Result]
1. The VM is deployed

[Actual Result]
Curtin failed during the deployment

Here is the error message:

        Generating grub debconf_selections for devices=['/dev/vda'] uefi=False
        finish: cmd-install/stage-curthooks/builtin/cmd-curthooks/install-grub: FAIL: installing grub to target devices
        finish: cmd-install/stage-curthooks/builtin/cmd-curthooks/configuring-bootloader: FAIL: configuring target system bootloader
        finish: cmd-install/stage-curthooks/builtin/cmd-curthooks: FAIL: curtin command curthooks
        Traceback (most recent call last):
          File "/curtin/curtin/commands/main.py", line 202, in main
            ret = args.func(args)
          File "/curtin/curtin/commands/curthooks.py", line 1770, in curthooks
            builtin_curthooks(cfg, target, state)
          File "/curtin/curtin/commands/curthooks.py", line 1736, in builtin_curthooks
            setup_grub(cfg, target, osfamily=osfamily)
          File "/curtin/curtin/commands/curthooks.py", line 689, in setup_grub
            configure_grub_debconf(instdevs, target, uefi_bootable)
          File "/curtin/curtin/commands/curthooks.py", line 509, in configure_grub_debconf
            link = block.disk_to_byid_path(dev)
          File "/curtin/curtin/block/__init__.py", line 869, in disk_to_byid_path
            mapping = get_dev_disk_byid()
          File "/curtin/curtin/block/__init__.py", line 861, in get_dev_disk_byid
            return _get_dev_disk_by_prefix('/dev/disk/by-id')
          File "/curtin/curtin/block/__init__.py", line 846, in _get_dev_disk_by_prefix
            for path in os.listdir(prefix)]
        FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-id'
        [Errno 2] No such file or directory: '/dev/disk/by-id'

VM kernel: Linux scalebot-dev-controller5 4.15.0-117-generic #118-Ubuntu SMP Fri Sep 4 20:02:41 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: curtin (not installed)
ProcVersionSignature: User Name 4.15.0-117.118-generic 4.15.18
Uname: Linux 4.15.0-117-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.9-0ubuntu7.17
Architecture: amd64
Date: Thu Sep 17 04:38:32 2020
SourcePackage: curtin
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Taihsiang Ho (tai271828) wrote :
Revision history for this message
Taihsiang Ho (tai271828) wrote :

More logs from curtin and /var/log

Revision history for this message
Taihsiang Ho (tai271828) wrote :

The log file from comment1 was fetched in the vm node by logging into the target deployment node.

Revision history for this message
Taihsiang Ho (tai271828) wrote :

I did see the /dev/disk/by-id folder. However, there is /dev/disk/by-uuid in the vm node instead:

ubuntu@scalebot-dev-controller5:~$ ls /dev/
block ecryptfs initctl loop2 mem ppp shm tty1 tty17 tty24 tty31 tty39 tty46 tty53 tty60 ttyS1 ttyS17 ttyS24 ttyS31 uhid vcs5 vcsa6 zfs
btrfs-control fd input loop3 memory_bandwidth psaux snapshot tty10 tty18 tty25 tty32 tty4 tty47 tty54 tty61 ttyS10 ttyS18 ttyS25 ttyS4 uinput vcs6 vda
bus full kmsg loop4 mqueue ptmx snd tty11 tty19 tty26 tty33 tty40 tty48 tty55 tty62 ttyS11 ttyS19 ttyS26 ttyS5 urandom vcsa vda1
char fuse lightnvm loop5 net pts stderr tty12 tty2 tty27 tty34 tty41 tty49 tty56 tty63 ttyS12 ttyS2 ttyS27 ttyS6 vcs vcsa1 vfio
console hpet log loop6 network_latency random stdin tty13 tty20 tty28 tty35 tty42 tty5 tty57 tty7 ttyS13 ttyS20 ttyS28 ttyS7 vcs1 vcsa2 vga_arbiter
core hugepages loop-control loop7 network_throughput rfkill stdout tty14 tty21 tty29 tty36 tty43 tty50 tty58 tty8 ttyS14 ttyS21 ttyS29 ttyS8 vcs2 vcsa3 virtio-ports
cpu_dma_latency hwrng loop0 mapper null rtc tty tty15 tty22 tty3 tty37 tty44 tty51 tty59 tty9 ttyS15 ttyS22 ttyS3 ttyS9 vcs3 vcsa4 vport0p1
disk i2c-0 loop1 mcelog port rtc0 tty0 tty16 tty23 tty30 tty38 tty45 tty52 tty6 ttyS0 ttyS16 ttyS23 ttyS30 ttyprintk vcs4 vcsa5 zero
ubuntu@scalebot-dev-controller5:~$ ls /dev/disk
by-label by-partuuid by-path by-uuid
ubuntu@scalebot-dev-controller5:~$

Revision history for this message
Taihsiang Ho (tai271828) wrote :

There were deployed kvm nodes and they are running well, so I inclined to think the kvm host and its communication with maas are both good.

Taihsiang Ho (tai271828)
description: updated
description: updated
Revision history for this message
Patricia Domingues (patriciasd) wrote :

some info about this issue:
1. when it was working (able to deploy the enlisted VMs) we had MAAS 2.8.1 and curtin version `curtin: Installation started. (19.3-26-g82f23e3d-0ubuntu1~18.04.1)`.
2. it is possible to deploy VMs created via MAAS KVM host type virsh.
3. the VMs that we can not deploy anymore were created via uvt-kvm(uvtool).
4. creating a NEW VM via virt-install(virtinst) and using an image from `https://cloud-images.ubuntu.com/` it does works - can deploy the VMs.
5. having the same error on amd64 and arm64.

Revision history for this message
Robie Basak (racb) wrote :

I don't understand why you think there is a bug in uvtool here. Please could you explain and provide steps to reproduce specifically for uvtool?

Changed in uvtool:
status: New → Incomplete
Revision history for this message
Patricia Domingues (patriciasd) wrote :

Robie, I just added uvtool because we cannot deploy anymore enlisted VMs created via uvtool, which previously were working.

Revision history for this message
Ryan Harper (raharper) wrote :

I believe this is a duplicate of: https://bugs.launchpad.net/curtin/+bug/1876258
This has already been fixed in curtin, you can test with curtin's daily PPA, or curtin in groovy.

W.r.t uvtool; like MAAS, uvtool should generate serial disk attributes for all virtio disks by default.

Changed in curtin:
status: New → Invalid
Revision history for this message
Patricia Domingues (patriciasd) wrote :

Ryan, thanks. I tried adding curtin's daily PPA. I've added this to MAAS' curtin_userdata
```
  add_repo: ["add-apt-repository", "-y", "ppa:curtin-dev/daily"]
  update: ["sh", "-c", "apt-get update"]
  install: ["sh", "-c", "apt install -y curtin"]
```

I can see on MAAS' logs latest curtin was installed:
```
After this operation, 866 kB of additional disk space will be used.
Get:1 http://ppa.launchpad.net/curtin-dev/daily/ubuntu bionic/main arm64 curtin-common all 20.1-891-gac7d5e5c-0ubuntu1+237~trunk~ubuntu18.04.1 [26.2 kB]
Get:2 http://ppa.launchpad.net/curtin-dev/daily/ubuntu bionic/main arm64 python3-curtin all 20.1-891-gac7d5e5c-0ubuntu1+237~trunk~ubuntu18.04.1 [173 kB]
Get:3 http://ppa.launchpad.net/curtin-dev/daily/ubuntu bionic/main arm64 curtin all 20.1-891-gac7d5e5c-0ubuntu1+237~trunk~ubuntu18.04.1 [19.8 kB]
```

but still having the same error trying to deploy the guest:
```
FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-id'
        [Errno 2] No such file or directory: '/dev/disk/by-id'
```

Revision history for this message
Ryan Harper (raharper) wrote :

Hi Patricia,

You'll need to upgrade MAAS itself. The curtin_userdata is what's sent to the nodes deploying.

Revision history for this message
Patricia Domingues (patriciasd) wrote :

thanks again Ryan.
manually adding serial to uvtool's virtio disk makes them work again - able to deploy via MAAS.

Revision history for this message
Taihsiang Ho (tai271828) wrote :

@Patricia I also tried to deploy scalebot charm and use some VMs (scalebot-dev-controller2 and 3). They are able to deploy now. Thanks for looking this issue, and thanks @Ryan for your helpful information.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.