Can't compose kvm host with lvm storage on maas 2.8.4

Bug #1921658 reported by Seyeong Kim
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Won't Fix
Medium
Unassigned
readline (Ubuntu)
Fix Released
Medium
Unassigned
Bionic
Incomplete
Medium
Seyeong Kim

Bug Description

[Impact]

I can't compose kvm host on maas 2.8.4 ( bionic)

I upgraded twisted and related component with pip but the symptom is the same.

MaaS 2.9.x in Focal works fine.

in 2.8.x, pexpect virsh vol-path should return [2] but returns [3]

[2]
/dev/maas_data_vg/8d4e8b04-4031-4a1b-b5f2-a8306192db11
[3]
2021-03-17 20:43:34 stderr: [error] Message: 'this is the result...\n'
2021-03-17 20:43:34 stderr: [error] Arguments: ([' ', '<3ef-46ca-87c8-19171950592f --pool maas_guest_lvm_vg', "error: command 'attach-disk' doesn't support option --pool"],)

sometimes it fails in

    def get_volume_path(self, pool, volume):
        """Return the path to the file from `pool` and `volume`."""
        output = self.run(["vol-path", volume, "--pool", pool])
        return output.strip()

sometimes failes in

    def get_machine_xml(self, machine):
        # Check if we have a cached version of the XML.
        # This is a short-lived object, so we don't need to worry about
        # expiring objects in the cache.
        if machine in self.xml:
            return self.xml[machine]

        # Grab the XML from virsh if we don't have it already.
        output = self.run(["dumpxml", machine]).strip()
        if output.startswith("error:"):
            maaslog.error("%s: Failed to get XML for machine", machine)
            return None

        # Cache the XML, since we'll need it later to reconfigure the VM.
        self.xml[machine] = output
        return output

I assume that run function has issue.

Command line virsh vol-path and simple pepect python code works fine.

Any advice for this issue?

Thanks.

[Test Plan]

0) deploy Bionic and MAAS 2.8

1) Create file to be used as loopback device

sudo dd if=/dev/zero of=lvm bs=16000 count=1M

2) sudo losetup /dev/loop39 lvm

3) sudo pvcreate /dev/loop39

4) sudo vgcreate maas_data_vg /dev/loop39

5) Save below xml:
<pool type='logical'>
<name>maas_guest_lvm_vg</name>
<source>
<name>maas_data_vg</name>
<format type='lvm2'/>
</source>
<target>
<path>/dev/maas_data_vg</path>
</target>
</pool>

6) virsh pool-create maas_guest_lvm_vg.xml

7) Add KVM host in MaaS

8) Attempt to compose a POD using storage pool maas_guest_lvm_vg

9) GUI will fail with:

Pod unable to compose machine: Unable to compose machine because: Failed talking to pod: Start tag expected, '<' not found, line 1, column 1 (<string>, line 1)

[Where problems could occer]

This patch is small peice of huge commit.
I tested by compiling test pkg with this patch. but actually it is kind of underlying library ( libreadline ), so It could affect to any application using libreadline.
e.g running command inside application by code can be affected.

[Other Info]

Seyeong Kim (seyeongkim)
description: updated
Seyeong Kim (seyeongkim)
description: updated
Revision history for this message
Alberto Donato (ack) wrote :

Hi, does the volume for the machine get created in the pool?

Changed in maas:
status: New → Incomplete
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

@ack

yeah,

pexpect should return only result of command but in this case ( maas 2.8.4 with python 3.6 ( bionic )) it returns from in the middle of command
e.g <3ef-46ca-87c8-19171950592f --pool maas_guest_lvm_vg

you can check reproducer and can see this symptom easily

tags: added: sts
Changed in maas:
status: Incomplete → New
Revision history for this message
Alberto Donato (ack) wrote :

What ubuntu version is the virsh host running on?

Changed in maas:
status: New → Incomplete
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

@ack

Bionic, it is

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

pexpect cmd and result.. for FYI

May 31 07:53:14 xtrusia sh[27020]: 2021-05-31 07:53:14 stdout: [info] Seyeong pexpect.send() : b'vol-path 9d2217b0-4433-48a7-9be9-32dc8552c498 --pool maas_guest_lvm_vg
' 41
May 31 07:53:14 xtrusia maas.drivers.pod.virsh: [info] Seyeong run : [' ', '<433-48a7-9be9-32dc8552c498 --pool maas_guest_lvm_vg', '/dev/maas_data_vg/9d2217b0-4433-48a7-9be9-32dc8552c498', '']

Changed in maas:
status: Incomplete → In Progress
Alberto Donato (ack)
Changed in maas:
status: In Progress → New
Revision history for this message
Brett Milford (brettmilford) wrote :

I've tried replicating this issue.

In my setup (bionic, 2.8.6 (8602-g.07cdffcaa-0ubuntu1~18.04.1) with logical storage pool setup as per the bug.

My setup consistently fails with
2021-06-08 18:31:41 provisioningserver.rpc.pods: [critical] rare-kodiak: Failed to compose machine: RequestedMachine(hostname='sure-cattle', architecture='amd64/generic', cores=1, memory=2048, block_devices=[RequestedMachineBlockDevice(size=8000000000, tags=['maas_guest_lvm_vg'])], interfaces=[RequestedMachineInterface(ifname=None, attach_name=None, attach_type=None, attach_options=None, requested_ips=[], ip_mode=None)], cpu_speed=None, known_host_interfaces=[])

In
File "/usr/lib/python3/dist-packages/provisioningserver/drivers/pod/virsh.py", line 815, in configure_pxe_boot
            doc = etree.XML(xml)

There pexpect appears to be consistently returning what was input to the 'virsh #' prompt as well as the output. On occasion appears to be returning a truncated version of the input commencing with '<' or some left over output from a previous command.

This is the rackd log with added print statements:
https://paste.ubuntu.com/p/7Kyj3hN66K/

Both the volume and the vm are defined/created on the node. Sometimes they even manage to partially commission despite the UI error.

Alberto Donato (ack)
Changed in maas:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 2.8.x
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

building upstream libvirt 4.0.0 is working fine.
building from ubuntu bionic pkg is not working

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

building with ./configure & make is working fine with ubuntu bionic pkg.

checking debian build conf

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

when i build with this manually. I see error ( from rules )

./configure --build=x86_64-linux-gnu --prefix=/ --disable-silent-rules --disable-maintainer-mode --disable-dependency-tracking "--with-packager=Victor Manuel Tapia King <email address hidden> Thu, 18 Feb 2021 17:40:38 +0100" --with-packager-version=1ubuntu8.19 --with-default-editor=sensible-editor --disable-silent-rules --disable-rpath --with-qemu --with-qemu-user=libvirt-qemu --with-qemu-group=kvm --with-openvz --with-avahi --with-sasl --with-yajl --without-ssh2 --with-polkit --with-udev --with-storage-fs --with-storage-dir --with-storage-lvm --with-storage-iscsi --with-storage-disk --with-storage-sheepdog --with-storage-rbd --with-storage-gluster --with-storage-zfs --with-init-script=systemd --with-numactl --with-numad --without-selinux --with-apparmor --with-secdriver-apparmor --with-apparmor-profiles --with-esx --without-phyp --with-capng --enable-debug --with-macvtap --with-network --with-netcf --with-xen --with-libxl --with-vbox --with-lxc --with-dtrace --with-audit --without-hal --without-firewalld --without-attr --with-nss-plugin --with-wireshark-dissector

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

Please ignore my comment for now
sometimes it is working, sometimes not.
taking a look further

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

I tested anther thing.

I changed libreadline from 7 to 8 ( in addition to this, I needed to add libtinfo.so.6 as well, as a dependent lib )

I'm able to compose kvm host.

I'll confirm this again

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

https://pastebin.ubuntu.com/p/hqHBNbnpbq/

This code fixes this issue. actually this looks like readline issue.

I'm going to prepare debdiff if this can be SRUed ( though im not sure )

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

above code is part of this upstream commit

commit 8e6ccd0373d77b86ed37a9a7d232ccfea3d6670c (tag: readline-8.0, 8)
Author: Chet Ramey <email address hidden>
Date: Mon Jan 7 09:30:21 2019 -0500

    readline-8.0 distribution sources and documentation

Revision history for this message
Seyeong Kim (seyeongkim) wrote :
Changed in readline (Ubuntu):
status: New → Fix Released
Changed in readline (Ubuntu Bionic):
status: New → In Progress
assignee: nobody → Seyeong Kim (seyeongkim)
Mathew Hodson (mhodson)
Changed in readline (Ubuntu):
importance: Undecided → Medium
Changed in readline (Ubuntu Bionic):
importance: Undecided → Medium
Revision history for this message
Dan Streetman (ddstreet) wrote :

@seyeongkim could you add the SRU template to the bug description?

https://wiki.ubuntu.com/StableReleaseUpdates#SRU_Bug_Template

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

@ddstreet

sorry for late response, I updated description.

Thanks.

description: updated
Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

@seyeongkim hi, I'm working on sponsoring your patch right now.
I had to do some minor fixes in it:
1. Since we're adding delta in relation to Debian we should change version number to reflect that (e.g. 7.0-4ubuntu1).
2. The launchpad bug number has to follow the convention for the automation to correctly pick it up and close the bug automatically afterwards (i.e. LP: #1921658 and not "Closes: #1921658).
3. The patch header did not need the "+" signs in the beginning of the line.
4. There was a descriptive commit for one of the lines you backported (static int *local_prompt_newlines;) which you skipped. I think it should also be part of the backport as it explains why the variable is there in the first place.

I'll update this bug if needed.

Thanks for the patch Seyeong!

Revision history for this message
Robie Basak (racb) wrote :

SRU review

I don't see any explanation anywhere of what exactly is wrong in readline in Bionic or how it is being fixed. Normally I would expect to see such an explanation in both the SRU template and in the upstream patch description that is being cherry-picked. But in this case neither are present.

I don't think it's appropriate to SRU this without a full understanding of what is going on - especially when it's such a common library being patched. In particular, I don't think it's possible to assess regression risk with any confidence without a full understanding of what is being changed and why.

Please could you expand on this?

Changed in readline (Ubuntu Bionic):
status: In Progress → Incomplete
Revision history for this message
Dan Streetman (ddstreet) wrote :

sorry, @dgadomski has been out on vacation, i'll pick up the sponsoring work for this while he's out.

@seyeongkim, can you please clean up the sru template information? @racb is correct that the current info in the sru template doesn't explain anything about what the actual readline bug is or how it fixes anything.

Dan Streetman (ddstreet)
tags: added: sts-sponsor-ddstreet
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

@ddstreet

Thanks for your remind.

I also researching how to find proper reproducer for this issue

I only checked that this patch fixed the symptom while I'm using MAAS as I described here.

Dan Streetman (ddstreet)
tags: added: sts-sponsor-dgadomski
removed: sts-sponsor-ddstreet
Revision history for this message
Dan Streetman (ddstreet) wrote :

@seyeongkim, after reviewing this bug and looking at some of the data as well as maas and pexpect code, it seems to me like this isn't related to readline, I think the problem is maas is using pexpect without disabling echo.

Can you try changing the VirshSSH() class constructor so that it passes echo=False to its superclass constructor, and see if that helps?

Revision history for this message
Dan Streetman (ddstreet) wrote :

> Can you try changing the VirshSSH() class constructor so that it passes echo=False to its superclass constructor, and see if that helps?

also - it seems like the class is trying to work around the fact it's not disabling echo, by eliding the first line of the reply, in the run() method - that will most likely need to be changed to simply use the entire 'result' instead of just 'result[1:]'. In fact the splitlines() followed by '\n'.join() probably should be removed and just return the entire decoded string.

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

@ddstreet

Thanks for your advice,

FYI, This symptom is only happening when backend storage is LVM, it is ok when we just use local storage

And I've checked decoded string before but it was the same (error). LVM has error, local is ok

But I'm trying to test you mentioned

Thanks

Revision history for this message
Dan Streetman (ddstreet) wrote :

@seyeongkim did you have a chance to look at using echo=False in the VirshSSH() class?

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

@ddstreet sorry I thought I replied for it.

I've changed it for testing and MAAS didn't work properly.

Normal functionality didn't work.

Revision history for this message
Dan Streetman (ddstreet) wrote :

> I've changed it for testing and MAAS didn't work properly.

did you investigate why? did you also adjust the code as i mentioned in comment 22?

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

@ddstreet

yep I've checked that ( [:1] or something else )

This issue is only happening when back storage is LVM.

If it is not lvm, it is ok, only with lvm, pexepct returns weird string

2021-03-17 20:43:34 stderr: [error] Arguments: ([' ', '<3ef-46ca-87c8-19171950592f --pool maas_guest_lvm_vg', "error: command 'attach-disk' doesn't support option --pool"],)

I've checked MAAS code first at the time, but pexpect related code showed me the same result everywhere.

So I suspected that it is underlying issue. and compared libraries in maas snap, then I found out libreadline patch attached in this LP. ( as libradline is related to running command )

But I still can't find original libreadline reproducer or link between maas and libreadline yet.

I've asked this to libreadline's maintainer ( this patch's auther) but he also doesn't know, and the mentioned that this could be a side effect of patch.

Thanks

Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Issue resolved in 2.9. MAAS 2.8 is outside of the support window.

Changed in maas:
milestone: 2.8.x → none
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.