MAAS deployment failed with vgchange command

Bug #1847058 reported by Po-Hsu Lin
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned
ubuntu-kernel-tests
Triaged
Undecided
Unassigned
curtin (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

Issue found on our MAAS server, MAAS version: 2.6.1 (7832-g17912cdc9-0ubuntu1~18.04.1)

Trying to deploy Bionic on an AMD64 bare metal.

Deployment log shows it has failed with:

        curtin.util.ProcessExecutionError: Unexpected error while running command.
        Command: ['vgchange', '--activate=y']
        Exit code: 5
        Reason: -
        Stdout: 0 logical volume(s) in volume group "vg_459" now active
        Stderr: /usr/sbin/thin_check: execvp failed: No such file or directory
        Check of pool vg_459/pool_459 failed (status:2). Manual repair required!
        /usr/sbin/thin_check: execvp failed: No such file or directory

Please find the attachment for the complete deployment log from MAAS UI.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Po-Hsu Lin (cypressyew)
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Looks like this deployment failure is not happening on all the nodes, some of them still can be deployed with any issue.

It's affecting:
 * bavor (Eoan)
 * rizzo (B, D)
 * s2lp6g001 s390x KVM (X, E)
 * s2lp6g003 s390x KVM (D)

Changed in maas:
status: New → Invalid
Revision history for this message
Dan Watkins (oddbloke) wrote :

Thanks for the bug report! It looks to me like the problem here is that the lvm2 package only Suggests (up to disco) thin-provisioning-tools (up to disco), which is what provides the thin_check binary.

I would expect this to be resolved in recent eoan images (as thin-provisioning-tools is now a Recommends of ; can you confirm how recent the eoan images being used for testing are?

Revision history for this message
Dan Watkins (oddbloke) wrote :

Sorry, that second paragraph should be:

I would expect this to be resolved in recent eoan images (as thin-provisioning-tools is now a Recommends of lvm2 and therefore pulled in during image build); can you confirm how recent the eoan images being used for testing are?

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hi,
thanks for the reply, I tried to re-sync the Eoan image today and deploy it on a node that does not work before (bavor), it's still failing, but the error message is a bit different:

    curtin.util.ProcessExecutionError: Unexpected error while running command.
    Command: ['vgscan', '--mknodes']
    Exit code: 5
    Reason: -
    Stdout: Reading all physical volumes. This may take a while...
              Found volume group "vg_459" using metadata type lvm2

    Stderr: Command failed with status code 5.

Please find the attachment for the output from maas.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

BTW from our test log on the Jenkins server, it looks like this issue first occurs in 3 days 19 hrs before.

And this is blocking the kernel SRU testing to some extent.

Changed in ubuntu-kernel-tests:
status: New → Confirmed
Revision history for this message
Ryan Harper (raharper) wrote :

Curtin hasn't made any changes here; but I suspect the workload on the target machine which created the thin lv's has left metadata that curtin doesn't yet know how to clear. As a workaround for now, you can use the MAAS disk-erasure to clear the volumes before attempting to deploy.

https://maas.io/docs/disk-erasure

Ryan Harper (raharper)
Changed in curtin (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Sean Feole (sfeole) wrote :

To answer One of Dans questions from comment #4.

The maas server uses the images from maas.io. As Sam mentioned the problem originated last week. The systems were installed with "Bionic" I believe? I'll look into the maas-disk erasure feature. thanks Ryan

Revision history for this message
Sean Feole (sfeole) wrote :

The workaround proposed by Ryan worked. Maas Disk Erase appears to have fixed the problem, i'll ensure to do that on any of the affected systems.

Changed in ubuntu-kernel-tests:
status: Confirmed → Triaged
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Disk erasure performed on bavor, and it has made bavor back to work.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.