curtin error on arm64 wily deployment for xgene-2 Soc

Bug #1520400 reported by Newell Jensen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
curtin
Fix Released
Undecided
Unassigned

Bug Description

While trying to deploy hwe-w/wily images using MAAS for xgene-2 Soc, I ran into this error:

http://paste.ubuntu.com/13521957/

I searched to see if this was already a filed bug and thought that #1499869 might be related. I see that cloud-init changed fp.read() to fp and I did the same thing in url_helpers.py for curtin (I changed the only fp.read() to fp in that file) that is installed on the MAAS server. When I did that I got past the error as seen here:

http://paste.ubuntu.com/13521949/

The error above is because the root-tgz doesn't have a needed flash-kernel fix that I added to the root-image ephemeral. However, this shows that there is indeed an issue with curtin at the moment.

ubuntu@newell-new:~$ apt-cache policy python-curtin
python-curtin:
  Installed: 0.1.0~bzr314-0ubuntu1
  Candidate: 0.1.0~bzr314-0ubuntu1
  Version table:
 *** 0.1.0~bzr314-0ubuntu1 0
        500 http://ppa.launchpad.net/maas/next/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status
     0.1.0~bzr221-0ubuntu1~14.04.1 0
        500 http://archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
     0.1.0~bzr126-0ubuntu1 0
        500 http://archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

ubuntu@newell-new:~$ apt-cache policy maas
maas:
  Installed: 1.9.0~beta1+bzr4417-0ubuntu1~trusty1
  Candidate: 1.9.0~rc2+bzr4509-0ubuntu1~trusty1
  Version table:
     1.9.0~rc2+bzr4509-0ubuntu1~trusty1 0
        500 http://ppa.launchpad.net/maas/next/ubuntu/ trusty/main amd64 Packages
 *** 1.9.0~beta1+bzr4417-0ubuntu1~trusty1 0
        100 /var/lib/dpkg/status
     1.7.6+bzr3376-0ubuntu2~14.04.1 0
        500 http://archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
     1.5.4+bzr2294-0ubuntu1.2 0
        500 http://security.ubuntu.com/ubuntu/ trusty-security/main amd64 Packages
     1.5+bzr2252-0ubuntu1 0
        500 http://archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

It should be noted that even when changing fp.read() to fp, the ouptut would sometimes be different than the above. Sometimes I also see this now (and this could be the fact that more changes need to be made):

http://paste.ubuntu.com/13522047/

Related branches

Revision history for this message
Scott Moser (smoser) wrote :

The second paste http://paste.ubuntu.com/13522047/ seems to show the primary failure is:
[ 111.718754] cloud-init[1139]: /run/lvm/lvmetad.socket: connect failed: No such file or directory
[ 111.730177] cloud-init[1139]: WARNING: Failed to connect to lvmetad. Falling back to internal scanning.

It would seem that something caused lvm to not correctly start on installation.
Could you please get a failure like that and provide the full log?

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1520400] Re: curtin error on arm64 wily deployment for xgene-2 Soc

lvmmetad is not fatal, but this appears to be:

[ 112.498505] cloud-init[1139]: /dev/vgroot/lvroot: not found: device
not cleared
[ 112.499834] cloud-init[1139]: Aborting. Failed to wipe start of new LV.
[ 116.534735] cloud-init[1139]: An error occured handling
'vgroot-lvroot': ProcessExecutionError - Unexpected error while
running command.

Would be good to see the storage config passed to curtin and a clean
run with the current "fixes" (fp vs fp.read() and flash-kernel)
applied to see what the current error looks like.

On Fri, Dec 4, 2015 at 9:12 AM, Scott Moser <email address hidden> wrote:

> The second paste http://paste.ubuntu.com/13522047/ seems to show the
> primary failure is:
> [ 111.718754] cloud-init[1139]: /run/lvm/lvmetad.socket: connect failed:
> No such file or directory
> [ 111.730177] cloud-init[1139]: WARNING: Failed to connect to lvmetad.
> Falling back to internal scanning.
>
>
> It would seem that something caused lvm to not correctly start on
> installation.
> Could you please get a failure like that and provide the full log?
>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https://bugs.launchpad.net/bugs/1520400
>
> Title:
> curtin error on arm64 wily deployment for xgene-2 Soc
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1520400/+subscriptions
>

Revision history for this message
Newell Jensen (newell-jensen) wrote :

We are in process of getting new boards with BMCs and will report back to the above comments once that is done.

Revision history for this message
Newell Jensen (newell-jensen) wrote :

Here is some more information on my efforts to get xgene-2 working with MAAS.

With the correct patches in place I am now seeing this error with my deployment:

Processing triggers for initramfs-tools (0.103ubuntu4.2) ...
update-initramfs: Generating /boot/initrd.img-3.13.0-71-generic
WARNING: missing /lib/modules/3.13.0-71-generic
Device driver support needs thus be built-in linux image!
depmod: ERROR: could not open directory /lib/modules/3.13.0-71-generic: No such file or directory
depmod: FATAL: could not search modules: No such file or directory
cryptsetup: WARNING: failed to detect canonical device of /media/root-ro/
cryptsetup: WARNING: could not determine root device from /etc/fstab
W: mdadm: /etc/mdadm/mdadm.conf defines no arrays.
depmod: WARNING: could not open /tmp/mkinitramfs_ctfNOX/lib/modules/3.13.0-71-generic/modules.order: No such file or directory
depmod: WARNING: could not open /tmp/mkinitramfs_ctfNOX/lib/modules/3.13.0-71-generic/modules.builtin: No such file or directory
Unsupported platform on EFI system, doing nothing.
Processing triggers for ureadahead (0.100.0-16) ...
Processing triggers for ufw (0.34~rc-0ubuntu2) ...
mdadm: No arrays found in config file or automatically
Creating new GPT entries.
The operation has completed successfully.

Not entirely sure why the headers would not be installed to cause this error because when I examine the root-image they are there. Seems like maybe there is some issue with overlayroot not working properly.
For the entire deployment console output, please see the below:

http://paste.ubuntu.com/14125038/

Revision history for this message
Ryan Harper (raharper) wrote :

Check how much memory is available... the overlayroot fs (root and /tmp)
are housed in host ram. If you have too little then it's possible that
writes fail.
I've seen this testing in VMs with too little ram.

On Sun, Dec 20, 2015 at 11:04 PM, Newell Jensen <<email address hidden>
> wrote:

> Here is some more information on my efforts to get xgene-2 working with
> MAAS.
>
> With the correct patches in place I am now seeing this error with my
> deployment:
>
> Processing triggers for initramfs-tools (0.103ubuntu4.2) ...
> update-initramfs: Generating /boot/initrd.img-3.13.0-71-generic
> WARNING: missing /lib/modules/3.13.0-71-generic
> Device driver support needs thus be built-in linux image!
> depmod: ERROR: could not open directory /lib/modules/3.13.0-71-generic: No
> such file or directory
> depmod: FATAL: could not search modules: No such file or directory
> cryptsetup: WARNING: failed to detect canonical device of /media/root-ro/
> cryptsetup: WARNING: could not determine root device from /etc/fstab
> W: mdadm: /etc/mdadm/mdadm.conf defines no arrays.
> depmod: WARNING: could not open
> /tmp/mkinitramfs_ctfNOX/lib/modules/3.13.0-71-generic/modules.order: No
> such file or directory
> depmod: WARNING: could not open
> /tmp/mkinitramfs_ctfNOX/lib/modules/3.13.0-71-generic/modules.builtin: No
> such file or directory
> Unsupported platform on EFI system, doing nothing.
> Processing triggers for ureadahead (0.100.0-16) ...
> Processing triggers for ufw (0.34~rc-0ubuntu2) ...
> mdadm: No arrays found in config file or automatically
> Creating new GPT entries.
> The operation has completed successfully.
>
>
> Not entirely sure why the headers would not be installed to cause this
> error because when I examine the root-image they are there. Seems like
> maybe there is some issue with overlayroot not working properly.
> For the entire deployment console output, please see the below:
>
> http://paste.ubuntu.com/14125038/
>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https://bugs.launchpad.net/bugs/1520400
>
> Title:
> curtin error on arm64 wily deployment for xgene-2 Soc
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1520400/+subscriptions
>

Revision history for this message
Newell Jensen (newell-jensen) wrote :

My MAAS server (host) has approximately 60 GB of free memory. I also monitored this during deployment to make sure that something really odd was not happening. If access is needed to this system I can provide it.

Revision history for this message
Newell Jensen (newell-jensen) wrote :

I should also note that the above effort in comment in https://bugs.launchpad.net/curtin/+bug/1520400/comments/4 is with a flat filesystem as I was trying to side step any potential lvm issues.

Scott, is there anything that you would need from me for pushing forward on this with this second effort?

Revision history for this message
Newell Jensen (newell-jensen) wrote :

I have successfully deployed on xgene-2 now. With the fixes from the branch that was merged (linked to this bug) and using a root-tgz that was not junk (thanks to Scott for helping me debug that), I was able to successfully deploy.

I am setting this to fix committed as the fixes have landed in trunk.

Changed in curtin:
status: New → Fix Committed
Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.