Curtin doesn't clean up previous MD configuration

Bug #1618429 reported by Ante Karamatić on 2016-08-30
28
This bug affects 2 people
Affects Status Importance Assigned to Milestone
curtin
High
Unassigned
curtin (Ubuntu)
Undecided
Unassigned
Xenial
Undecided
Unassigned

Bug Description

[Impact]

 * On some machines which have existing MDADM RAID metadata on one or
   more of disks, curtin fails to remove this existing metadata when
   instructed to do so and fails to install on such machines.

   Curtin has been updated to ignore mdadm asseble errors specifically
   in the case where curtin has been instructed to wipe a designated
   device. In the above case, curtin encountered an unexpected return
   code from mdadm assemble command which is not relevant since curtin
   is going to wipe the underlying device for re-installation.

[Test Case]

 * Install proposed curtin package and deploy to a machine with a
   partial mdadm raid array which cannot be properly assembled.

  PASS: Successfully deploy image with RAID configuration included.

  FAIL: Deployment fails with the following error:

    Command: ['mdadm', '--assemble', '--scan']
    Exit code: 3
    Reason: -
    Stdout: ''
    Stderr: u'mdadm: /dev/md/4 assembled from 3 drives
            not enough to start the array.

[Regression Potential]

 * Users requesting curtin 'preserve' existing raid configurations may
   be impacted.

[Original Description]

When deploying a machine in MAAS with a MD setup, deployment fails. Inspection shows that curtin doesn't clean up existin MD devices. On a failed machine I can see in dmesg:

[ 22.352672] md/raid1:md2: active with 2 out of 2 mirrors
[ 22.730212] md/raid1:md1: active with 2 out of 2 mirrors

these are MD devices from previous deployment. Instead of deleting those, curtin tries to create a new one. So /proc/mdstat shows:

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md3 : inactive md1[1](S) md2[2](S)
      3125299568 blocks super 1.2

md1 : active raid1 sdd[1] sdc[0]
      1562649792 blocks super 1.2 [2/2] [UU]

md2 : active raid1 sdf[1] sde[0]
      1562649792 blocks super 1.2 [2/2] [UU]

unused devices: <none>

MAAS's storage config appears to be correct.

Related branches

Ante Karamatić (ivoks) wrote :
Ante Karamatić (ivoks) wrote :
tags: added: 4010
Ante Karamatić (ivoks) wrote :

I forgot to mention, this is curtin in 14.04. I'd assume that's 0.1.0~bzr399-0ubuntu1~16.04.1.

Hi Ante,

Can you attach the install.log reported back to maas? It should be on the
node details page.

https://gist.github.com/smoser/2610e9b78b8d7b54319675d9e3986a1b

On Tue, Aug 30, 2016 at 7:02 AM, Ante Karamatić <
<email address hidden>> wrote:

> I forgot to mention, this is curtin in 14.04. I'd assume that's
> 0.1.0~bzr399-0ubuntu1~16.04.1.
>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https://bugs.launchpad.net/bugs/1618429
>
> Title:
> Curtin doesn't clean up previous MD configuration
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
>

Ante Karamatić (ivoks) wrote :

I don't have from that specific machine, because I had to move on. But
error was pretty much the same as this one (maybe md3 instead of md4):

Error: /dev/sda: unrecognised disk labelAn error occured handling
'sda': ProcessExecutionError - Unexpected error while running
command.Command: ['mdadm', '--assemble', '--scan']Exit code: 3Reason:
-Stdout: ''Stderr: u'mdadm: /dev/md/4 assembled from 3 drives - not
enough to start the array.\n'Unexpected error while running
command.Command: ['mdadm', '--assemble', '--scan']Exit code: 3Reason:
-Stdout: ''Stderr: u'mdadm: /dev/md/4 assembled from 3 drives - not
enough to start the array.\n'Installation failed with exception:
Unexpected error while running command.Command: ['curtin',
'block-meta', 'custom']Exit code: 3Reason: -Stdout: "Error: /dev/sda:
unrecognised disk label\nAn error occured handling 'sda':
ProcessExecutionError - Unexpected error while running
command.\nCommand: ['mdadm', '--assemble', '--scan']\nExit code:
3\nReason: -\nStdout: ''\nStderr: u'mdadm: /dev/md/4 assembled from 3
drives - not enough to start the array.\\n'\nUnexpected error while
running command.\nCommand: ['mdadm', '--assemble', '--scan']\nExit
code: 3\nReason: -\nStdout: ''\nStderr: u'mdadm: /dev/md/4 assembled
from 3 drives - not enough to start the array.\\n'\n"Stderr: ''

On Tue, Aug 30, 2016 at 3:40 PM Ryan Harper <email address hidden>
wrote:

> Hi Ante,
>
> Can you attach the install.log reported back to maas? It should be on the
> node details page.
>
> https://gist.github.com/smoser/2610e9b78b8d7b54319675d9e3986a1b
>
> On Tue, Aug 30, 2016 at 7:02 AM, Ante Karamatić <
> <email address hidden>> wrote:
>
> > I forgot to mention, this is curtin in 14.04. I'd assume that's
> > 0.1.0~bzr399-0ubuntu1~16.04.1.
> >
> > --
> > You received this bug notification because you are subscribed to curtin.
> > Matching subscriptions: curtin-bugs-all
> > https://bugs.launchpad.net/bugs/1618429
> >
> > Title:
> > Curtin doesn't clean up previous MD configuration
> >
> > To manage notifications about this bug go to:
> > https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
> >
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1618429
>
> Title:
> Curtin doesn't clean up previous MD configuration
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
>
--
Ante Karamatić
<email address hidden>
Canonical

Ryan Harper (raharper) wrote :

OK, if you do run into it again or reproduce it, the entire log is
important, not just the stack trace.

On Tue, Aug 30, 2016 at 8:54 AM, Ante Karamatić <
<email address hidden>> wrote:

> I don't have from that specific machine, because I had to move on. But
> error was pretty much the same as this one (maybe md3 instead of md4):
>
> Error: /dev/sda: unrecognised disk labelAn error occured handling
> 'sda': ProcessExecutionError - Unexpected error while running
> command.Command: ['mdadm', '--assemble', '--scan']Exit code: 3Reason:
> -Stdout: ''Stderr: u'mdadm: /dev/md/4 assembled from 3 drives - not
> enough to start the array.\n'Unexpected error while running
> command.Command: ['mdadm', '--assemble', '--scan']Exit code: 3Reason:
> -Stdout: ''Stderr: u'mdadm: /dev/md/4 assembled from 3 drives - not
> enough to start the array.\n'Installation failed with exception:
> Unexpected error while running command.Command: ['curtin',
> 'block-meta', 'custom']Exit code: 3Reason: -Stdout: "Error: /dev/sda:
> unrecognised disk label\nAn error occured handling 'sda':
> ProcessExecutionError - Unexpected error while running
> command.\nCommand: ['mdadm', '--assemble', '--scan']\nExit code:
> 3\nReason: -\nStdout: ''\nStderr: u'mdadm: /dev/md/4 assembled from 3
> drives - not enough to start the array.\\n'\nUnexpected error while
> running command.\nCommand: ['mdadm', '--assemble', '--scan']\nExit
> code: 3\nReason: -\nStdout: ''\nStderr: u'mdadm: /dev/md/4 assembled
> from 3 drives - not enough to start the array.\\n'\n"Stderr: ''
>
>
> On Tue, Aug 30, 2016 at 3:40 PM Ryan Harper <email address hidden>
> wrote:
>
> > Hi Ante,
> >
> > Can you attach the install.log reported back to maas? It should be on
> the
> > node details page.
> >
> > https://gist.github.com/smoser/2610e9b78b8d7b54319675d9e3986a1b
> >
> > On Tue, Aug 30, 2016 at 7:02 AM, Ante Karamatić <
> > <email address hidden>> wrote:
> >
> > > I forgot to mention, this is curtin in 14.04. I'd assume that's
> > > 0.1.0~bzr399-0ubuntu1~16.04.1.
> > >
> > > --
> > > You received this bug notification because you are subscribed to
> curtin.
> > > Matching subscriptions: curtin-bugs-all
> > > https://bugs.launchpad.net/bugs/1618429
> > >
> > > Title:
> > > Curtin doesn't clean up previous MD configuration
> > >
> > > To manage notifications about this bug go to:
> > > https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
> > >
> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https://bugs.launchpad.net/bugs/1618429
> >
> > Title:
> > Curtin doesn't clean up previous MD configuration
> >
> > To manage notifications about this bug go to:
> > https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
> >
> --
> Ante Karamatić
> <email address hidden>
> Canonical
>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https://bugs.launchpad.net/bugs/1618429
>
> Title:
> Curtin doesn't clean up previous MD configuration
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
>

Ante Karamatić (ivoks) wrote :
Download full text (3.8 KiB)

That's all there was in installation log. Installation couldn't go further
without the disk, so that was all.

On Tue, Aug 30, 2016 at 4:10 PM Ryan Harper <email address hidden>
wrote:

> OK, if you do run into it again or reproduce it, the entire log is
> important, not just the stack trace.
>
> On Tue, Aug 30, 2016 at 8:54 AM, Ante Karamatić <
> <email address hidden>> wrote:
>
> > I don't have from that specific machine, because I had to move on. But
> > error was pretty much the same as this one (maybe md3 instead of md4):
> >
> > Error: /dev/sda: unrecognised disk labelAn error occured handling
> > 'sda': ProcessExecutionError - Unexpected error while running
> > command.Command: ['mdadm', '--assemble', '--scan']Exit code: 3Reason:
> > -Stdout: ''Stderr: u'mdadm: /dev/md/4 assembled from 3 drives - not
> > enough to start the array.\n'Unexpected error while running
> > command.Command: ['mdadm', '--assemble', '--scan']Exit code: 3Reason:
> > -Stdout: ''Stderr: u'mdadm: /dev/md/4 assembled from 3 drives - not
> > enough to start the array.\n'Installation failed with exception:
> > Unexpected error while running command.Command: ['curtin',
> > 'block-meta', 'custom']Exit code: 3Reason: -Stdout: "Error: /dev/sda:
> > unrecognised disk label\nAn error occured handling 'sda':
> > ProcessExecutionError - Unexpected error while running
> > command.\nCommand: ['mdadm', '--assemble', '--scan']\nExit code:
> > 3\nReason: -\nStdout: ''\nStderr: u'mdadm: /dev/md/4 assembled from 3
> > drives - not enough to start the array.\\n'\nUnexpected error while
> > running command.\nCommand: ['mdadm', '--assemble', '--scan']\nExit
> > code: 3\nReason: -\nStdout: ''\nStderr: u'mdadm: /dev/md/4 assembled
> > from 3 drives - not enough to start the array.\\n'\n"Stderr: ''
> >
> >
> > On Tue, Aug 30, 2016 at 3:40 PM Ryan Harper <email address hidden>
> > wrote:
> >
> > > Hi Ante,
> > >
> > > Can you attach the install.log reported back to maas? It should be on
> > the
> > > node details page.
> > >
> > > https://gist.github.com/smoser/2610e9b78b8d7b54319675d9e3986a1b
> > >
> > > On Tue, Aug 30, 2016 at 7:02 AM, Ante Karamatić <
> > > <email address hidden>> wrote:
> > >
> > > > I forgot to mention, this is curtin in 14.04. I'd assume that's
> > > > 0.1.0~bzr399-0ubuntu1~16.04.1.
> > > >
> > > > --
> > > > You received this bug notification because you are subscribed to
> > curtin.
> > > > Matching subscriptions: curtin-bugs-all
> > > > https://bugs.launchpad.net/bugs/1618429
> > > >
> > > > Title:
> > > > Curtin doesn't clean up previous MD configuration
> > > >
> > > > To manage notifications about this bug go to:
> > > > https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
> > > >
> > >
> > > --
> > > You received this bug notification because you are subscribed to the
> bug
> > > report.
> > > https://bugs.launchpad.net/bugs/1618429
> > >
> > > Title:
> > > Curtin doesn't clean up previous MD configuration
> > >
> > > To manage notifications about this bug go to:
> > > https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
> > >
> > --
> > Ante Karamatić
> > <email address hidden>
> > Canonical
> >
> > --
> > ...

Read more...

Ryan Harper (raharper) wrote :

The flow of execution for this bug is:

curtin is instructed to wipe the superblock of the devices in the storage config.
For each disk, we need to attempt to see if it may be part of mdadm array which allows curtin to determine how to wipe mdadm metadata. In this process, curtin invokes the mdadm --assemble --scan mode to discover the possible arrays. The current code that's been tested expected return codes of 0, 1 or 2. However, in your case, it's returned 3; this is an unexpected output.

I've not been able to recreate the mdadm --assemble --scan return code of 3; the mdadm code does not easily reveal why it returns 3 in the above case versus the consistent return code of 1 when I recreate missing elements of the array on trusty.

root@ubuntu:/curtin# mdadm --assemble --scan -vv
mdadm: looking for devices for /dev/md/0
mdadm: no RAID superblock on /dev/sdi
mdadm: no RAID superblock on /dev/sde
mdadm: no RAID superblock on /dev/sdc
mdadm: no RAID superblock on /dev/sdd
mdadm: no RAID superblock on /dev/sdb
mdadm: no RAID superblock on /dev/sda2
mdadm: no RAID superblock on /dev/sda1
mdadm: no RAID superblock on /dev/sda
mdadm: cannot open device /dev/sr0: No medium found
mdadm: no RAID superblock on /dev/vda
mdadm: /dev/sdh is identified as a member of /dev/md/0, slot 6.
mdadm: /dev/sdg is identified as a member of /dev/md/0, slot 5.
mdadm: /dev/sdf is identified as a member of /dev/md/0, slot 4.
mdadm: no uptodate device for slot 0 of /dev/md/0
mdadm: no uptodate device for slot 1 of /dev/md/0
mdadm: no uptodate device for slot 2 of /dev/md/0
mdadm: no uptodate device for slot 3 of /dev/md/0
mdadm: added /dev/sdg to /dev/md/0 as 5
mdadm: added /dev/sdh to /dev/md/0 as 6
mdadm: no uptodate device for slot 7 of /dev/md/0
mdadm: added /dev/sdf to /dev/md/0 as 4
mdadm: /dev/md/0 assembled from 3 drives - not enough to start the array.
root@ubuntu:/curtin# echo $?
1

That said, it's unlikely that we actually care about any assemble errors doing the portion of the code where we're attempting to wipe a disk of any previous metadata. We're currently investigating a case where failure to assemble an array prevented new arrays from being constructed (lvm for example refuses to use devices with mdadm metadata present). If we can determine that assemble is required, we'll update the curtin mdadm code to allow us to catch and ignore errors during the wipe case as needed.

Changed in curtin:
importance: Undecided → High
status: New → Confirmed
tags: added: maas
Joey Stanford (joey) on 2016-09-14
tags: added: bootstack
tags: added: canonical-bootstack
removed: bootstack
Tytus Kurek (tkurek) wrote :

This affects some other deployment. All details, including logs, have been provided at https://bugs.launchpad.net/maas/+bug/1623481

Ryan Harper (raharper) wrote :

A build should show up in the curtin-daily ppa in a bit:

https://code.launchpad.net/~curtin-dev/+archive/ubuntu/daily/

Changed in curtin:
status: Confirmed → Fix Committed
Tytus Kurek (tkurek) wrote :
Download full text (16.8 KiB)

Ryan,

After installing 'curtin' package from 'ppa:curtin-dev/daily' PPA on the MaaS node:

bootstack@os-maas-1:~$ sudo add-apt-repository ppa:curtin-dev/daily

 More info: https://launchpad.net/~curtin-dev/+archive/ubuntu/daily
Press [ENTER] to continue or ctrl-c to cancel adding it

gpg: keyring `/tmp/tmpbr4t6tkh/secring.gpg' created
gpg: keyring `/tmp/tmpbr4t6tkh/pubring.gpg' created
gpg: requesting key 0165013E from hkp server keyserver.ubuntu.com
gpg: /tmp/tmpbr4t6tkh/trustdb.gpg: trustdb created
gpg: key 0165013E: public key "Launchpad PPA for curtin developers" imported
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
OK
bootstack@os-maas-1:~$ sudo apt-get update
Ign http://us.archive.ubuntu.com trusty InRelease
Get:1 http://us.archive.ubuntu.com trusty-updates InRelease [65.9 kB]
Hit http://us.archive.ubuntu.com trusty-backports InRelease
Hit http://us.archive.ubuntu.com trusty Release.gpg
Get:2 http://us.archive.ubuntu.com trusty-updates/main Sources [381 kB]
Get:3 http://us.archive.ubuntu.com trusty-updates/restricted Sources [5,360 B]
Get:4 http://us.archive.ubuntu.com trusty-updates/universe Sources [164 kB]
Get:5 http://us.archive.ubuntu.com trusty-updates/multiverse Sources [7,126 B]
Hit http://security.ubuntu.com trusty-security InRelease
Get:6 http://ppa.launchpad.net trusty InRelease [16.0 kB]
Hit http://security.ubuntu.com trusty-security/main Sources
Hit http://security.ubuntu.com trusty-security/restricted Sources
Get:7 http://us.archive.ubuntu.com trusty-updates/main amd64 Packages [893 kB]
Hit http://ppa.launchpad.net trusty InRelease
Get:8 http://us.archive.ubuntu.com trusty-updates/restricted amd64 Packages [15.9 kB]
Get:9 http://us.archive.ubuntu.com trusty-updates/universe amd64 Packages [374 kB]
Get:10 http://us.archive.ubuntu.com trusty-updates/multiverse amd64 Packages [14.8 kB]
Hit http://security.ubuntu.com trusty-security/universe Sources
Get:11 http://us.archive.ubuntu.com trusty-updates/main i386 Packages [855 kB]
Hit http://ppa.launchpad.net trusty InRelease
Get:12 http://us.archive.ubuntu.com trusty-updates/restricted i386 Packages [15.6 kB]
Get:13 http://us.archive.ubuntu.com trusty-updates/universe i386 Packages [375 kB]
Get:14 http://us.archive.ubuntu.com trusty-updates/multiverse i386 Packages [15.2 kB]
Hit http://security.ubuntu.com trusty-security/multiverse Sources
Hit http://us.archive.ubuntu.com trusty-updates/main Translation-en
Hit http://us.archive.ubuntu.com trusty-updates/multiverse Translation-en
Hit http://us.archive.ubuntu.com trusty-updates/restricted Translation-en
Hit http://security.ubuntu.com trusty-security/main amd64 Packages
Hit http://us.archive.ubuntu.com trusty-updates/universe Translation-en
Hit http://us.archive.ubuntu.com trusty Release
Get:15 http://ppa.launchpad.net trusty/main amd64 Packages [1,109 B]
Hit http://us.arc...

Tytus Kurek (tkurek) wrote :
Ryan Harper (raharper) wrote :

Can you provide the curtin config you send to the nodes?

https://gist.github.com/smoser/2610e9b78b8d7b54319675d9e3986a1b

It looks like some apt repo config is being sent and breaking/blocking
access to packages needed.

On Fri, Sep 16, 2016 at 6:46 AM, Tytus <email address hidden> wrote:

> ** Attachment added: "screenshots.tar.gz"
> https://bugs.launchpad.net/curtin/+bug/1618429/+
> attachment/4742022/+files/screenshots.tar.gz
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1618429
>
> Title:
> Curtin doesn't clean up previous MD configuration
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
>

Ryan Harper (raharper) wrote :

The screenshot shows this "Stale file handle line" which is a bug in
kernel overlayfs:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1618572

which is what I think you're seeing now.

On Fri, Sep 16, 2016 at 8:44 AM, Ryan Harper <email address hidden>
wrote:

> Can you provide the curtin config you send to the nodes?
>
> https://gist.github.com/smoser/2610e9b78b8d7b54319675d9e3986a1b
>
> It looks like some apt repo config is being sent and breaking/blocking
> access to packages needed.
>
> On Fri, Sep 16, 2016 at 6:46 AM, Tytus <email address hidden> wrote:
>
>> ** Attachment added: "screenshots.tar.gz"
>> https://bugs.launchpad.net/curtin/+bug/1618429/+attachment/
>> 4742022/+files/screenshots.tar.gz
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1618429
>>
>> Title:
>> Curtin doesn't clean up previous MD configuration
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
>>
>
>

Tytus Kurek (tkurek) wrote :

This happens when using default curtin config.

Tytus Kurek (tkurek) wrote :

An update. It looks like the issue I attached in "screenshots.tar.gz" file occurred after updating the boot source to point to "daily" instead of "releases", not after installing "curtin" package from your PPA.

Tytus Kurek (tkurek) wrote :

I installed "curtin" package from "ppa:curtin-dev/daily" PPA on MaaS node, attempted to deploy with Ubuntu Trusty and "hwe-x" kernel, and the issue with mdadm still persists.

Ryan Harper (raharper) wrote :

Hi, in order to further investigate this bug we'll need to get some
information about the node. Please collect and attach the storage
configuration for the node. You can get this information with:

* maas 1.9 via cli
maas <session> node get-curtin-config <system-id>
maas <session> maas set-config name=curtin_verbose value=true

* maas 2.0 via cli
maas <session> machine get-curtin-config <system-id>

* Web UI:
On the node details page in the installation output section at the bottom
of the page

The cloud-init output logs from the host that's failing are of particular
importance so we can
see if you're encountering a different issue with mdadm wiping, versus what
we've found to be
an issue with the original bug (mdadm assemble return codes when wiping
disks).

Thanks,
Ryan

On Mon, Sep 19, 2016 at 10:25 AM, Tytus <email address hidden> wrote:

> I installed "curtin" package from "ppa:curtin-dev/daily" PPA on MaaS
> node, attempted to deploy with Ubuntu Trusty and "hwe-x" kernel, and the
> issue with mdadm still persists.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1618429
>
> Title:
> Curtin doesn't clean up previous MD configuration
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
>

Tytus Kurek (tkurek) wrote :
Download full text (9.1 KiB)

Ryan,

Please find the output below:

bootstack@optionsit-os-maas-1:~$ maas maas-root node get-curtin-config node-40b72c26-7e68-11e6-9b2c-3c4a92b2d991
Success.
Machine-readable output follows:
apt_mirrors:
  ubuntu_archive: http://archive.ubuntu.com//ubuntu
  ubuntu_security: http://archive.ubuntu.com//ubuntu
apt_proxy: http://172.18.252.11:8000/
debconf_selections:
  maas: 'cloud-init cloud-init/datasources multiselect MAAS

    cloud-init cloud-init/maas-metadata-url string http://172.18.252.11/MAAS/metadata/

    cloud-init cloud-init/maas-metadata-credentials string oauth_token_key=qeRbdZp4SJvdEeCXvB&oauth_token_secret=MYec62ZQ8whbDGLp6tymSjVtKD7d5S4c&oauth_consumer_key=nYFBV3arCSuvbn4cft

    cloud-init cloud-init/local-cloud-config string apt_preserve_sources_list:
    true\napt_proxy: http://172.18.252.11:8000/\nmanage_etc_hosts: false\nmanual_cache_clean:
    true\nreporting:\n maas: {consumer_key: nYFBV3arCSuvbn4cft, endpoint: ''http://172.18.252.11/MAAS/metadata/status/node-40b72c26-7e68-11e6-9b2c-3c4a92b2d991'',\n token_key:
    qeRbdZp4SJvdEeCXvB, token_secret: MYec62ZQ8whbDGLp6tymSjVtKD7d5S4c,\n type:
    webhook}\nsystem_info:\n package_mirrors:\n - arches: [i386, amd64]\n failsafe:
    {primary: ''http://archive.ubuntu.com/ubuntu'', security: ''http://security.ubuntu.com/ubuntu''}\n search:\n primary:
    [''http://archive.ubuntu.com/ubuntu'']\n security: [''http://archive.ubuntu.com/ubuntu'']\n -
    arches: [default]\n failsafe: {primary: ''http://ports.ubuntu.com/ubuntu-ports'',
    security: ''http://ports.ubuntu.com/ubuntu-ports''}\n search:\n primary:
    [''http://ports.ubuntu.com/ubuntu-ports'']\n security: [''http://ports.ubuntu.com/ubuntu-ports'']\n

    '
install:
  log_file: /tmp/install.log
  post_files:
  - /tmp/install.log
kernel:
  mapping: {}
  package: linux-generic-lts-xenial
late_commands:
  maas:
  - wget
  - --no-proxy
  - http://172.18.252.11/MAAS/metadata/latest/by-id/node-40b72c26-7e68-11e6-9b2c-3c4a92b2d991/
  - --post-data
  - op=netboot_off
  - -O
  - /dev/null
network:
  config:
  - id: eth2
    mac_address: 14:02:ec:06:c8:ec
    mtu: 8900
    name: eth2
    subnets:
    - address: 172.18.252.22/23
      dns_nameservers: []
      gateway: 172.18.252.1
      type: static
    type: physical
  - id: eth0
    mac_address: 24:8a:07:0d:de:7c
    mtu: 8900
    name: eth0
    subnets:
    - type: manual
    type: physical
  - id: eth5
    mac_address: 14:02:ec:06:c8:ef
    mtu: 8900
    name: eth5
    subnets:
    - type: manual
    type: physical
  - id: eth4
    mac_address: 14:02:ec:06:c8:ee
    mtu: 8900
    name: eth4
    subnets:
    - type: manual
    type: physical
  - id: eth3
    mac_address: 14:02:ec:06:c8:ed
    mtu: 8900
    name: eth3
    subnets:
    - type: manual
    type: physical
  - id: eth1
    mac_address: 24:8a:07:0d:de:7d
    mtu: 8900
    name: eth1
    subnets:
    - type: manual
    type: physical
  - address: 172.18.252.11
    search:
    - ny4.oit-ops.com
    type: nameserver
  version: 1
network_commands:
  builtin:
  - curtin
  - net-meta
  - custom
partitioning_commands:
  builtin:
  - curtin
  - block-meta
  - custom
powe...

Read more...

Tytus Kurek (tkurek) wrote :
Tytus Kurek (tkurek) wrote :
Ryan Harper (raharper) wrote :

Thanks for the configs and logs. It appears to me that it's not running
the newer curtin.
The updated curtin package runs mdadm --assemble --scan -v, which I'm not
seeing in the cloud-init-output.log

If you can obtain the install.log that curtin sends back to maas, I'm told
that's on the Node details page.
Or if you have access to the endpoint, maas configures to write it out to
/tmp/install.log.

That would be the most helpful.

On Tue, Sep 20, 2016 at 1:27 AM, Tytus <email address hidden> wrote:

> ** Attachment added: "cloud-init-output.log"
> https://bugs.launchpad.net/curtin/+bug/1618429/+
> attachment/4744100/+files/cloud-init-output.log
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1618429
>
> Title:
> Curtin doesn't clean up previous MD configuration
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
>

Tytus Kurek (tkurek) wrote :

These nodes are no longer available for testing at the moment. I'll provide an update if anything changes here.

Ryan Harper (raharper) on 2016-10-03
description: updated

Hello Ante, or anyone else affected,

Accepted curtin into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr425-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
Andy Whitcroft (apw) on 2016-10-05
Changed in curtin (Ubuntu):
status: New → Fix Released
Changed in curtin (Ubuntu Xenial):
status: New → Fix Committed
Jon Grimm (jgrimm) wrote :

Added José Pekkarinen, in hopes able to verify the xenial-proposed fix for this bug.

Ryan Harper (raharper) wrote :

I'd like to add some comments for the SRU verification.

This issue was discovered on a customer's system which is not currently available for verification. Analyzing the error code in the original bug was when mdadm assemble was called on an array that was incomplete (some members not available) and produced a return code of '2'. The curtin code that handles mdadm assembly was not expecting a return code of '2' and exited.

After examination of the process, curtin is attempting to assemble any arrays to discover any block device dependencies in an effort to wipe them all clean (ensuring that we can install and reboot into a system configured as expected). We determined that during storage preparation curtin does not care specifically if assembly fails as it was going to *wipe* the target block devices in any case; thus adding a flag to ignore return codes from mdadm assembly during this phase was the recommended solution.

We were not able to simulate this particular failure, so instead we have a unittest which explicitly returns a value of 2 to simulate the situation and ensure that during our prepare phase we ignore those errors.

Despite not recreating this specific error, we feel confident that the fix for wiping previous md devices is fixed in this release.

Jon Grimm (jgrimm) on 2016-10-14
tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 0.1.0~bzr425-0ubuntu1~16.04.1

---------------
curtin (0.1.0~bzr425-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  [ Scott Moser ]
  * debian/new-upstream-snapshot: add writing of debian changelog entries.

  [ Ryan Harper ]
  * New upstream snapshot.
    - unittest,tox.ini: catch and fix issue with trusty-level mock of open
    - block/mdadm: add option to ignore mdadm_assemble errors (LP: #1618429)
    - curtin/doc: overhaul curtin documentation for readthedocs.org
      (LP: #1351085)
    - curtin.util: re-add support for RunInChroot (LP: #1617375)
    - curtin/net: overhaul of eni rendering to handle mixed ipv4/ipv6 configs
    - curtin.block: refactor clear_holders logic into block.clear_holders and
      cli cmd
    - curtin.apply_net should exit non-zero upon exception. (LP: #1615780)
    - apt: fix bug in disable_suites if sources.list line is blank.
    - vmtests: disable Wily in vmtests
    - Fix the unittests for test_apt_source.
    - get CURTIN_VMTEST_PARALLEL shown correctly in jenkins-runner output
    - fix vmtest check_file_strippedline to strip lines before comparing
    - fix whitespace damage in tests/vmtests/__init__.py
    - fix dpkg-reconfigure when debconf_selections was provided.
      (LP: #1609614)
    - fix apt tests on non-intel arch
    - Add apt features to curtin. (LP: #1574113)
    - vmtest: easier use of parallel and controlling timeouts
    - mkfs.vfat: add force flag for formating whole disks (LP: #1597923)
    - block.mkfs: fix sectorsize flag (LP: #1597522)
    - block_meta: cleanup use of sys_block_path and handle cciss knames
      (LP: #1562249)
    - block.get_blockdev_sector_size: handle _lsblock multi result return
      (LP: #1598310)
    - util: add target (chroot) support to subp, add target_path helper.
    - block_meta: fallback to parted if blkid does not produce output
      (LP: #1524031)
    - commands.block_wipe: correct default wipe mode to 'superblock'
    - tox.ini: run coverage normally rather than separately
    - move uefi boot knowledge from launch and vmtest to xkvm

 -- Ryan Harper <email address hidden> Mon, 03 Oct 2016 13:43:54 -0500

Changed in curtin (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for curtin has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Ante Karamatić (ivoks) wrote :

I'm not sure this is fixed. We are still seeing problems:

MAAS' curtin version is 0.1.0~bzr425-0ubuntu1~16.04.1. Installation log and curtin-config (obfuscated) is provided as attachments.

Ante Karamatić (ivoks) wrote :

Is it possible to include the entire installation.log?

Do repeat deploys to the same node and same configuration fail consistently?

On Wed, Jan 18, 2017 at 6:50 AM, Ante Karamatić <
<email address hidden>> wrote:

> ** Attachment added: "installation.log"
> https://bugs.launchpad.net/curtin/+bug/1618429/+
> attachment/4805810/+files/installation.log
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1618429
>
> Title:
> Curtin doesn't clean up previous MD configuration
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
>

Ante Karamatić (ivoks) wrote :

Yes, failures are consistent and happen even after 'erase disks' is enabled
in MAAS. They also happen on all nodes. I'm afraid I can't get any more
logs than this. This is all that MAAS can capture in this environment.

On Wed, Jan 18, 2017 at 3:00 PM Ryan Harper <email address hidden>
wrote:

> Is it possible to include the entire installation.log?
>
> Do repeat deploys to the same node and same configuration fail
> consistently?
>
> On Wed, Jan 18, 2017 at 6:50 AM, Ante Karamatić <
> <email address hidden>> wrote:
>
> > ** Attachment added: "installation.log"
> > https://bugs.launchpad.net/curtin/+bug/1618429/+
> > attachment/4805810/+files/installation.log
> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https://bugs.launchpad.net/bugs/1618429
> >
> > Title:
> > Curtin doesn't clean up previous MD configuration
> >
> > To manage notifications about this bug go to:
> > https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
> >
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1618429
>
> Title:
> Curtin doesn't clean up previous MD configuration
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
>
--
Ante Karamatić
<email address hidden>
Canonical

Ryan Harper (raharper) wrote :
Download full text (3.2 KiB)

On Wed, Jan 18, 2017 at 8:43 AM, Ante Karamatić <
<email address hidden>> wrote:

> Yes, failures are consistent and happen even after 'erase disks' is enabled
> in MAAS. They also happen on all nodes. I'm afraid I can't get any more
> logs than this. This is all that MAAS can capture in this environment.
>

Hrm, it would be *really* helpful to have the complete install log. In
particular, curtin dumps
out the storage relationship of the disks before attempting to apply the
storage config

For example, when I attempt to recreate this by redeploying the same config
to the same disks
I can see this output:

Current device storage tree:
vdc
|-- vdc1
|-- vdc2
| `-- md0
`-- vdc3
    `-- md1
vdd
|-- vdd1
|-- vdd2
| `-- md0
`-- vdd3
    `-- md1

Which shows the relationship that was discovered and cleared. We then clear
the tree from the bottom up; wiping md superblocks and metadata,
wiping partitions, and then devices. Resulting in a successful clearing
out of
previous on-disk storage configuration. And install succeeds. Something
is different between these systems and lack of debug info is going to
prevent
us from coming to a resolution.

The full log details the steps and the lists of current block device
holders.

If we have access to a node, then I'd like to get a node in failed state to
inspect.

https://gist.github.com/smoser/2610e9b78b8d7b54319675d9e3986a1b

That has details on how to stop a deployment node:

ssh ubuntu@$HOST_IP sudo touch /run/block-curtin-poweroff

If you can, then after a failed curtin install, getting at
/var/log/curtin/install.log
will contain everything we need to investigate further.

> On Wed, Jan 18, 2017 at 3:00 PM Ryan Harper <email address hidden>
> wrote:
>
> > Is it possible to include the entire installation.log?
> >
> > Do repeat deploys to the same node and same configuration fail
> > consistently?
> >
> > On Wed, Jan 18, 2017 at 6:50 AM, Ante Karamatić <
> > <email address hidden>> wrote:
> >
> > > ** Attachment added: "installation.log"
> > > https://bugs.launchpad.net/curtin/+bug/1618429/+
> > > attachment/4805810/+files/installation.log
> > >
> > > --
> > > You received this bug notification because you are subscribed to the
> bug
> > > report.
> > > https://bugs.launchpad.net/bugs/1618429
> > >
> > > Title:
> > > Curtin doesn't clean up previous MD configuration
> > >
> > > To manage notifications about this bug go to:
> > > https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
> > >
> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https://bugs.launchpad.net/bugs/1618429
> >
> > Title:
> > Curtin doesn't clean up previous MD configuration
> >
> > To manage notifications about this bug go to:
> > https://bugs.launchpad.net/curtin/+bug/1618429/+subscriptions
> >
> --
> Ante Karamatić
> <email address hidden>
> Canonical
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1618429
>
> Title:
> Curtin doesn't clean up previous MD configuration
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/...

Read more...

Ryan Harper (raharper) wrote :

If we can get the full installation log with the same error, please open up a new bug against curtin. While the error mentioned in comment #30 is mdadm related, this bug addressed errors during mdadm scan of previous disks and the issue in comment #30 is something else.

Paolo de Rosa (paolo-de-rosa) wrote :

This bug is believed to be fixed in curtin in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers