curtin does not recognize / partition created in MAAS

Bug #1774186 reported by Ashley Lai
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
curtin
Triaged
Medium
Unassigned

Bug Description

On MAAS I unmount and removed the default / filesystem and created several partitions and mount one of the partion as /. The deployment failed with the error below
http://people.canonical.com/~alai/curtinError.png

On Xenial I did apt update and upgrade yesterday.

Related branches

Ashley Lai (alai)
tags: added: cdo-qa cdo-qa-blocker cpe-onsite foundations-engine
Revision history for this message
Ashley Lai (alai) wrote :
Download full text (10.5 KiB)

After enabled curtin_verbose and deployed a different node, I see an error in the log.

https://pastebin.canonical.com/p/pQgkpKPWjQ/

ubuntu@client:~$ maas admin machine get-curtin-config yyrntp
Success.
Machine-readable output follows:
apt:
  preserve_sources_list: false
  primary:
  - arches:
    - default
    uri: http://archive.ubuntu.com/ubuntu
  proxy: http://10.216.2.0:8000/
  security:
  - arches:
    - default
    uri: http://archive.ubuntu.com/ubuntu
cloudconfig:
  maas-cloud-config:
    content: "#cloud-config\ndatasource:\n MAAS: {consumer_key: LAqgWt9EmHDmtYTRsN,\
      \ metadata_url: 'http://10.216.2.0/MAAS/metadata/',\n token_key: UYtwJ7U6gxyU3Tuktv,\
      \ token_secret: ync6pvJKHRC2p2Bwax6W4tdjj4UbraK6}\n"
    path: /etc/cloud/cloud.cfg.d/90_maas_cloud_config.cfg
  maas-datasource:
    content: 'datasource_list: [ MAAS ]'
    path: /etc/cloud/cloud.cfg.d/90_maas_datasource.cfg
  maas-reporting:
    content: "#cloud-config\nreporting:\n maas: {consumer_key: LAqgWt9EmHDmtYTRsN,\
      \ endpoint: 'http://10.216.2.0/MAAS/metadata/status/yyrntp',\n token_key:\
      \ UYtwJ7U6gxyU3Tuktv, token_secret: ync6pvJKHRC2p2Bwax6W4tdjj4UbraK6,\n type:\
      \ webhook}\n"
    path: /etc/cloud/cloud.cfg.d/90_maas_cloud_init_reporting.cfg
  maas-ubuntu-sso:
    content: '#cloud-config

      snappy: {email: <email address hidden>}

      '
    path: /etc/cloud/cloud.cfg.d/90_maas_ubuntu_sso.cfg
debconf_selections:
  grub2: grub2 grub2/update_nvram boolean false
  maas: 'cloud-init cloud-init/datasources multiselect MAAS

    cloud-init cloud-init/maas-metadata-url string http://10.216.2.0/MAAS/metadata/

    cloud-init cloud-init/maas-metadata-credentials string oauth_token_secret=ync6pvJKHRC2p2Bwax6W4tdjj4UbraK6&oauth_consumer_key=LAqgWt9EmHDmtYTRsN&oauth_token_key=UYtwJ7U6gxyU3Tuktv

    cloud-init cloud-init/local-cloud-config string apt:\n preserve_sources_list:
    false\n primary:\n - arches: [default]\n uri: http://archive.ubuntu.com/ubuntu\n proxy:
    http://10.216.2.0:8000/\n security:\n - arches: [default]\n uri: http://archive.ubuntu.com/ubuntu\napt_preserve_sources_list:
    true\napt_proxy: http://10.216.2.0:8000/\nmanage_etc_hosts: false\nmanual_cache_clean:
    true\nreporting:\n maas: {consumer_key: LAqgWt9EmHDmtYTRsN, endpoint: ''http://10.216.2.0/MAAS/metadata/status/yyrntp'',\n token_key:
    UYtwJ7U6gxyU3Tuktv, token_secret: ync6pvJKHRC2p2Bwax6W4tdjj4UbraK6,\n type:
    webhook}\nsystem_info:\n package_mirrors:\n - arches: [i386, amd64]\n failsafe:
    {primary: ''http://archive.ubuntu.com/ubuntu'', security: ''http://security.ubuntu.com/ubuntu''}\n search:\n primary:
    [''http://archive.ubuntu.com/ubuntu'']\n security: [''http://archive.ubuntu.com/ubuntu'']\n -
    arches: [default]\n failsafe: {primary: ''http://ports.ubuntu.com/ubuntu-ports'',
    security: ''http://ports.ubuntu.com/ubuntu-ports''}\n search:\n primary:
    [''http://ports.ubuntu.com/ubuntu-ports'']\n security: [''http://ports.ubuntu.com/ubuntu-ports'']\n

    '
early_commands:
  driver_00:
  - sh
  - -c
  - echo third party drivers not installed or necessary.
install:
  l...

Revision history for this message
Ashley Lai (alai) wrote :

rsyslog shows it failed to load kernel module.
https://pastebin.canonical.com/p/stPPjnQnyM/

Revision history for this message
Blake Rouse (blake-rouse) wrote :

The traceback in regiond.log is benine that is unrelated to anything with storage.

You said you enabled curtin_verbose. Can you provide the installation log? With verbose on it will provide a lot more detail.

Revision history for this message
Scott Moser (smoser) wrote :

Ashley provided curtin version as
# dpkg -l | grep curtin
ii curtin-common 18.1-627-gf98eb1b-0ubuntu1~ubuntu16.04.1 all Library and tools for curtin installer
ii python3-curtin 18.1-627-gf98eb1b-0ubuntu1~ubuntu16.04.1 all Library and tools for curtin installer

Revision history for this message
Scott Moser (smoser) wrote :

The storage config provided in comment 1 provides the first partition on nvme0n1 twice. That is definitely invalid config as the id are supposed to be unique to the config. I don't think its related, but it does seme to imply something is awry. Why would maas send that twice?

  - device: nvme0n1
    flag: bios_grub
    id: nvme0n1-part1
    number: 1
    offset: 4194304B
    size: 1048576B
    type: partition
    wipe: zero
  - device: nvme0n1
    id: nvme0n1-part1
    name: nvme0n1-part1
    number: 1
    size: 999997571072B
    type: partition
    uuid: c18875d9-2342-4dfb-8f07-7609ce84bb48
    wipe: superblock

Revision history for this message
Ryan Harper (raharper) wrote :

This is likely related to something we've seen in our vmtest runs, it's a race between when the kernel updates sysfs due to partitions being wiped and when we walk sysfs.

https://bugs.launchpad.net/curtin/+bug/1774042

If we can get the curtin install.log, or the rsyslog with the stack trace, we can confirm if it's related and mark that a dupe.

Revision history for this message
Scott Moser (smoser) wrote :

I've reproduced the issue here with a storage config like:
 http://paste.ubuntu.com/p/PhkZWZ7H2V/
But modified to this works fine:
 http://paste.ubuntu.com/p/frxG5JHhZR/

The issue is that the config has a partition number 2 (sda-part2) but does
not have partition number 1. That is strictly valid, I believe will fix
your issue.

Specifically, if you added a partition like:

  - device: sda
    id: sda-part1
    name: sda-part1
    number: 1
    size: 1M
    type: partition
    uuid: f3bf9e6b-d53d-454c-af79-c7ed3d027015
    wipe: superblock

You don't have to use it, it just needs to be there.

Changed in curtin:
status: New → Triaged
Revision history for this message
Scott Moser (smoser) wrote :

here is a stacktrace of my run
 http://paste.ubuntu.com/p/DyWRszykQJ/

Scott Moser (smoser)
Changed in curtin:
importance: Undecided → Medium
Revision history for this message
Ashley Lai (alai) wrote :

I should add that the root partition I created was 50GB and mounted as ext4 at /. The commission kernel was set with xenial(hwe-16.04-edge). The log tab in maas showed the following. Hope it helps.
https://pastebin.canonical.com/p/TS6CSqskJ5/

Revision history for this message
Ryan Harper (raharper) wrote :

Yeah, I see what's going on here.

Curtin didn't handled a None return from 'find_previous_partition'; we'll fix that.

But I don't think MAAS really wanted to generate a storage config with the first partition having number: 2. That is, if someone adds and removes partitions and mounts but ends up with the same config (one partition on a disk mounted at root, for example), then shouldn't the yaml it generates be the same?

Revision history for this message
Ashley Lai (alai) wrote :

Is there a workaround for this issue or a fix to apply? This issue is on a paying customer's MAAS and they need to have this configuration. Thanks !!

The stacktrace from my install.log is similar to smoser.
https://pastebin.canonical.com/p/7MpddDD6S5/

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Could you please specify what version of MAAS is this with?

Changed in maas:
status: New → Incomplete
Revision history for this message
Ashley Lai (alai) wrote :

root@dcs1-clm-inf5:/home/ubuntu# dpkg -l | grep maas
ii maas-cli 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1 all MAAS client and command-line interface
ii maas-common 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1 all MAAS server common files
ii maas-dhcp 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1 all MAAS DHCP server
ii maas-dns 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1 all MAAS DNS server
ii maas-proxy 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1 all Rack Controller for MAAS
ii maas-region-api 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1 all Region controller API service for MAAS
ii python3-django-maas 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1 all MAAS server provisioning libraries (Python 3)
root@dcs1-clm-inf5:/home/ubuntu#

Revision history for this message
Ashley Lai (alai) wrote :

On the MAAS UI it shows
MAAS version: 2.3.2 (6485-ge93e044-0ubuntu1~16.04.1)

Revision history for this message
Ashley Lai (alai) wrote :

@smoser - could you please give more info on the work around? Do I have to create another sda-part1 partition or just add it? Also where do I add the sda-part1 and where is the uuid comes from?. Thanks!!

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Ok, so I've tried to reproduce the issue where in MAAS there's no part1 but there's a part2, and I've been unable. Ashley, could you provide a step by step?

Thanks.

Revision history for this message
Ashley Lai (alai) wrote :

After the node is commissioned unmount / and remove the partition for /. From this disk (let's call it sda) if you create a new partition it will start at sda-part2. I tried on another disk (sdb) and it always starts at part1 so there is something with the disk (sda) where it started with the / root partition.

Revision history for this message
Ashley Lai (alai) wrote :

@Andres - I see the issue now. When you create the partition on sda, it says sda-part1 but when you mount it, it changes to sda-part2.

Revision history for this message
Ashley Lai (alai) wrote :

Thanks smoser for the workaround suggestion. I'm using a different disk (sdb) to create the partition and it worked.

Revision history for this message
Ashley Lai (alai) wrote :

One of the node has NVMe as the root filesystem and I can't work around this issue. The customer uses NVMe disk for 'cache set' and when create a partition on NVMe it starts at part2 causing the deployment to fail.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

@Ashley,

Is this an EFI system?

Revision history for this message
Ashley Lai (alai) wrote :

They have a mixed of machines, 2 EFI and 7 non-EFI. For the EFI machines the /boot/efi partitions are created on part1 so and I keep that partition, remove / partition and add more partitions to the same disk. The non-EFI machines are the ones with issue because I need to remove the / root partition and it's the only partition on the disk.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Removed MAAS from this bug as the reason why this was added is really https://bugs.launchpad.net/maas/+bug/1774058

no longer affects: maas
Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1774186] Re: curtin does not recognize / partition created in MAAS

But with curtin fixed for bcache, there's no need for this
"workaround" in the UI which is generating different config that it
would otherwise?

Why wouldn't MAAS ensure the first partition in the config have number: 1 ?

On Thu, May 31, 2018 at 3:22 PM, Andres Rodriguez
<email address hidden> wrote:
> Removed MAAS from this bug as the reason why this was added is really
> https://bugs.launchpad.net/maas/+bug/1774058
>
> ** No longer affects: maas
>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https://bugs.launchpad.net/bugs/1774186
>
> Title:
> curtin does not recognize / partition created in MAAS
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1774186/+subscriptions

Revision history for this message
David Britton (dpb) wrote :

We think this particular issue was fixed in maas 1774058, if you find something else, please raise it separately.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.