Fail to install or commission wichita (power 8)

Bug #1717176 reported by Christian Ehrhardt 
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned
curtin (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Maas version: 2.2.2 (6099-g8751f91-0ubuntu1~16.04.1)
curtin version : 0.1.0~bzr505-0ubuntu1~16.04.1

Ends in:
[Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
curtin: Installation failed with exception: Unexpected error while running command.

Since this was a maas upgrade old config data could have been the issue, but the system was recommissioned and reproducibly fails at this.

Recommisioning fails at:
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 32.: HTTP Error 400: BAD REQUEST

While being at stage "00-maas-07-block-devices"

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This is on the new power maas on Kurhah when deploying Wichita.
It might be due to the upgrade (bug) or just something odd with our setup there so lets start to "invalid" and collect info.
If it seems to be a bug or up for discussion I'll switch back to new.

Changed in maas:
status: New → Invalid
Changed in curtin (Ubuntu):
status: New → Invalid
summary: - Fail to install Power box
+ Fail to install wichita box
Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: Fail to install wichita box

Full console log and Maas error report on the failed Xenial deploy

description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Tail of the console log and Maas error report on the failed Artful deploy

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This was to be expected as it is still in curtin, but the target release Artful/Xenial does not make a difference.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Cloud init rolls through without a lot of issues, then it starts curtin but that fails for the error reported above.
From start of curtin to the error there is not much more that goes on.

[ 65.296194] cloud-init[4049]: curtin: Installation started. (0.1.0~bzr505-0ubuntu1~16.04.1)
[ 65.303624] cloud-init[4049]: third party drivers not installed or necessary.
[ 66.450201] cloud-init[4049]: [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
[ 66.468709] cloud-init[4049]: curtin: Installation failed with exception: Unexpected error while running command.
[ 66.469210] cloud-init[4049]: Command: ['curtin', 'block-meta', 'custom']
[ 66.469493] cloud-init[4049]: Exit code: 3
[ 66.469719] cloud-init[4049]: Reason: -
[ 66.469940] cloud-init[4049]: Stdout: [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
[ 66.470255] cloud-init[4049]:
[ 66.470465] cloud-init[4049]: Stderr: ''
[ 66.613480] cloud-init[4049]: Unexpected error while running command.
[ 66.613789] cloud-init[4049]: Command: ['curtin', 'block-meta', 'custom']
[ 66.614046] cloud-init[4049]: Exit code: 3
[ 66.614256] cloud-init[4049]: Reason: -
[ 66.614472] cloud-init[4049]: Stdout: [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
[ 66.614778] cloud-init[4049]:
[ 66.614993] cloud-init[4049]: Stderr: ''
[ 66.628125] cloud-init[4049]: Cloud-init v. 0.7.9 running 'modules:final' at Thu, 14 Sep 2017 06:30:33 +0000. Up 64.42 seconds.
[ 66.628555] cloud-init[4049]: 2017-09-14 06:30:35,451 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [3]

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

That would likely be the first md_check_array_state call.
 sync_action = md_sysfs_attr(md_devname, 'sync_action')

It checks if the MD is idle or if it has to wait for a sync to complete (or to stop)

Now this is an upgraded maas, maybe the old DB for the node's disks create some odd config.
So release + re-commisioning (and let it forget all old config)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hmm, commisioning failed as well but obviously at a different stage.

Name Time Status
00-maas-00-support-info Thu, 14 Sep. 2017 07:05:39 Passed
00-maas-00-support-info.err Thu, 14 Sep. 2017 07:05:39 Passed
00-maas-01-cpuinfo Thu, 14 Sep. 2017 07:05:39 Passed
00-maas-01-lshw Thu, 14 Sep. 2017 07:06:01 Passed
00-maas-02-virtuality Thu, 14 Sep. 2017 07:06:01 Passed
00-maas-03-install-lldpd Thu, 14 Sep. 2017 07:06:02 Passed
00-maas-03-install-lldpd.err Thu, 14 Sep. 2017 07:06:02 Passed
00-maas-04-list-modaliases Thu, 14 Sep. 2017 07:06:02 Passed
00-maas-06-dhcp-unconfigured-ifaces Thu, 14 Sep. 2017 07:06:04 Passed
00-maas-07-block-devices Thu, 14 Sep. 2017 07:16:43 Timed out
00-maas-08-serial-ports Thu, 14 Sep. 2017 07:16:43 Aborted
99-maas-02-capture-lldp Thu, 14 Sep. 2017 07:16:43 Aborted
99-maas-03-network-interfaces Thu, 14 Sep. 2017 07:16:43 Aborted
99-maas-04-network-interfaces-with-sriov Thu, 14 Sep. 2017 07:16:43 Aborted

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Here the console log while commisioning.
It ends in http issues.

The IP is correct the path seems to miss something '//' maybe ?
I don't know from what this is generated but 2012-03-01 seems old?

This seems to be after some install activity:

Cloud-init v. 0.7.5 running 'modules:final' at Thu, 14 Sep 2017 07:05:23 +0000. Up 57.45 seconds.
Ign http://ports.ubuntu.com trusty InRelease
[...] (apt)
Setting up sshpass (1.05-1) ...
Processing triggers for libc-bin (2.19-0ubuntu6.13) ...
Processing triggers for ureadahead (0.100.0-16) ...
modprobe: ERROR: could not insert 'ipmi_si': No such device
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 1.: HTTP Error 400: BAD REQUEST
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 1.: HTTP Error 400: BAD REQUEST
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 2.: HTTP Error 400: BAD REQUEST
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 4.: HTTP Error 400: BAD REQUEST
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 8.: HTTP Error 400: BAD REQUEST
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 16.: HTTP Error 400: BAD REQUEST
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 32.: HTTP Error 400: BAD REQUEST
FAIL: HTTP error [400]

description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This happens at commisioning stage "00-maas-07-block-devices"

This error on commissioning is reproducible and makes it unusable.

summary: - Fail to install wichita box
+ Fail to install or commission wichita (power 8)
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We need to fix the recommisioning first and then can take a look at the curtin issue (if it even still exists after a working recommission).
Therefore setting the maas bug to new and keeping curtin invalid for now.

Changed in maas:
status: Invalid → New
Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1717176] Re: Fail to install wichita box

On Thu, Sep 14, 2017 at 1:44 AM, ChristianEhrhardt <
<email address hidden>> wrote:

> Full console log and Maas error report on the failed Xenial deploy
>
> ** Description changed:
>
> - Version: 2.2.2 (6099-g8751f91-0ubuntu1~16.04.1)
> + Maas version: 2.2.2 (6099-g8751f91-0ubuntu1~16.04.1)
> + curtin version : 0.1.0~bzr505-0ubuntu1~16.04.1
>
> - Maas output in install tab:
> -
> - curtin: Installation started. (0.1.0~bzr505-0ubuntu1~16.04.1)
> - third party drivers not installed or necessary.
> + Ends in:
> [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
> curtin: Installation failed with exception: Unexpected error while
> running command.
> - Command: ['curtin', 'block-meta', 'custom']
> - Exit code: 3
> - Reason: -
> - Stdout: [Errno 13] Permission denied: '/sys/class/block/md127/md/
> sync_action'
>

This is bug https://bugs.launchpad.net/bugs/1708052

Which is fixed in trunk.

Revision history for this message
Joshua Powers (powersj) wrote :

Two changes were made to the system:

1) added curtin-dev/daily ppa, upgraded curtin

2) Updated the maas_url in /etc/maas/rackd.conf per roaksoax to include the port 5240

Restarted maas-rackd and recommissioned successfully.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

With those two changes in place things "work".
- you already commissioned - ok
- I was able to install wichita - ok

I put "work" in quotes as maas officially considers it failed deployment.
Maybe some timeouts make it think so, but it worked and I can use the system.

Were there any past configs we lost to kind of be a bit patient with it?
@Maas team any recommendations?

... attaching logs

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

roflcopter :-)

I debugged two maas issues at once: one on arm and one on power8.
I just looked at the wrong one :-)
Remove the quotes - this one here just works.

Eventually we have to wait until the new curtin gets released, but that is already tracked.
Marking this invalid.

Thanks everybody involved.

Changed in maas:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.