Fail to install or commission wichita (power 8)

Bug #1717176 reported by ChristianEhrhardt on 2017-09-14
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Unassigned
curtin (Ubuntu)
Undecided
Unassigned

Bug Description

Maas version: 2.2.2 (6099-g8751f91-0ubuntu1~16.04.1)
curtin version : 0.1.0~bzr505-0ubuntu1~16.04.1

Ends in:
[Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
curtin: Installation failed with exception: Unexpected error while running command.

Since this was a maas upgrade old config data could have been the issue, but the system was recommissioned and reproducibly fails at this.

Recommisioning fails at:
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 32.: HTTP Error 400: BAD REQUEST

While being at stage "00-maas-07-block-devices"

ChristianEhrhardt (paelzer) wrote :

This is on the new power maas on Kurhah when deploying Wichita.
It might be due to the upgrade (bug) or just something odd with our setup there so lets start to "invalid" and collect info.
If it seems to be a bug or up for discussion I'll switch back to new.

Changed in maas:
status: New → Invalid
Changed in curtin (Ubuntu):
status: New → Invalid
summary: - Fail to install Power box
+ Fail to install wichita box

Full console log and Maas error report on the failed Xenial deploy

description: updated
ChristianEhrhardt (paelzer) wrote :

Tail of the console log and Maas error report on the failed Artful deploy

ChristianEhrhardt (paelzer) wrote :

This was to be expected as it is still in curtin, but the target release Artful/Xenial does not make a difference.

ChristianEhrhardt (paelzer) wrote :

Cloud init rolls through without a lot of issues, then it starts curtin but that fails for the error reported above.
From start of curtin to the error there is not much more that goes on.

[ 65.296194] cloud-init[4049]: curtin: Installation started. (0.1.0~bzr505-0ubuntu1~16.04.1)
[ 65.303624] cloud-init[4049]: third party drivers not installed or necessary.
[ 66.450201] cloud-init[4049]: [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
[ 66.468709] cloud-init[4049]: curtin: Installation failed with exception: Unexpected error while running command.
[ 66.469210] cloud-init[4049]: Command: ['curtin', 'block-meta', 'custom']
[ 66.469493] cloud-init[4049]: Exit code: 3
[ 66.469719] cloud-init[4049]: Reason: -
[ 66.469940] cloud-init[4049]: Stdout: [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
[ 66.470255] cloud-init[4049]:
[ 66.470465] cloud-init[4049]: Stderr: ''
[ 66.613480] cloud-init[4049]: Unexpected error while running command.
[ 66.613789] cloud-init[4049]: Command: ['curtin', 'block-meta', 'custom']
[ 66.614046] cloud-init[4049]: Exit code: 3
[ 66.614256] cloud-init[4049]: Reason: -
[ 66.614472] cloud-init[4049]: Stdout: [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
[ 66.614778] cloud-init[4049]:
[ 66.614993] cloud-init[4049]: Stderr: ''
[ 66.628125] cloud-init[4049]: Cloud-init v. 0.7.9 running 'modules:final' at Thu, 14 Sep 2017 06:30:33 +0000. Up 64.42 seconds.
[ 66.628555] cloud-init[4049]: 2017-09-14 06:30:35,451 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [3]

ChristianEhrhardt (paelzer) wrote :

That would likely be the first md_check_array_state call.
 sync_action = md_sysfs_attr(md_devname, 'sync_action')

It checks if the MD is idle or if it has to wait for a sync to complete (or to stop)

Now this is an upgraded maas, maybe the old DB for the node's disks create some odd config.
So release + re-commisioning (and let it forget all old config)

ChristianEhrhardt (paelzer) wrote :

Hmm, commisioning failed as well but obviously at a different stage.

Name Time Status
00-maas-00-support-info Thu, 14 Sep. 2017 07:05:39 Passed
00-maas-00-support-info.err Thu, 14 Sep. 2017 07:05:39 Passed
00-maas-01-cpuinfo Thu, 14 Sep. 2017 07:05:39 Passed
00-maas-01-lshw Thu, 14 Sep. 2017 07:06:01 Passed
00-maas-02-virtuality Thu, 14 Sep. 2017 07:06:01 Passed
00-maas-03-install-lldpd Thu, 14 Sep. 2017 07:06:02 Passed
00-maas-03-install-lldpd.err Thu, 14 Sep. 2017 07:06:02 Passed
00-maas-04-list-modaliases Thu, 14 Sep. 2017 07:06:02 Passed
00-maas-06-dhcp-unconfigured-ifaces Thu, 14 Sep. 2017 07:06:04 Passed
00-maas-07-block-devices Thu, 14 Sep. 2017 07:16:43 Timed out
00-maas-08-serial-ports Thu, 14 Sep. 2017 07:16:43 Aborted
99-maas-02-capture-lldp Thu, 14 Sep. 2017 07:16:43 Aborted
99-maas-03-network-interfaces Thu, 14 Sep. 2017 07:16:43 Aborted
99-maas-04-network-interfaces-with-sriov Thu, 14 Sep. 2017 07:16:43 Aborted

ChristianEhrhardt (paelzer) wrote :

Here the console log while commisioning.
It ends in http issues.

The IP is correct the path seems to miss something '//' maybe ?
I don't know from what this is generated but 2012-03-01 seems old?

This seems to be after some install activity:

Cloud-init v. 0.7.5 running 'modules:final' at Thu, 14 Sep 2017 07:05:23 +0000. Up 57.45 seconds.
Ign http://ports.ubuntu.com trusty InRelease
[...] (apt)
Setting up sshpass (1.05-1) ...
Processing triggers for libc-bin (2.19-0ubuntu6.13) ...
Processing triggers for ureadahead (0.100.0-16) ...
modprobe: ERROR: could not insert 'ipmi_si': No such device
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 1.: HTTP Error 400: BAD REQUEST
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 1.: HTTP Error 400: BAD REQUEST
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 2.: HTTP Error 400: BAD REQUEST
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 4.: HTTP Error 400: BAD REQUEST
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 8.: HTTP Error 400: BAD REQUEST
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 16.: HTTP Error 400: BAD REQUEST
request to http://10.245.71.3/MAAS/metadata//2012-03-01/ failed. sleeping 32.: HTTP Error 400: BAD REQUEST
FAIL: HTTP error [400]

description: updated
ChristianEhrhardt (paelzer) wrote :

This happens at commisioning stage "00-maas-07-block-devices"

This error on commissioning is reproducible and makes it unusable.

summary: - Fail to install wichita box
+ Fail to install or commission wichita (power 8)
description: updated
ChristianEhrhardt (paelzer) wrote :

We need to fix the recommisioning first and then can take a look at the curtin issue (if it even still exists after a working recommission).
Therefore setting the maas bug to new and keeping curtin invalid for now.

Changed in maas:
status: Invalid → New

On Thu, Sep 14, 2017 at 1:44 AM, ChristianEhrhardt <
<email address hidden>> wrote:

> Full console log and Maas error report on the failed Xenial deploy
>
> ** Description changed:
>
> - Version: 2.2.2 (6099-g8751f91-0ubuntu1~16.04.1)
> + Maas version: 2.2.2 (6099-g8751f91-0ubuntu1~16.04.1)
> + curtin version : 0.1.0~bzr505-0ubuntu1~16.04.1
>
> - Maas output in install tab:
> -
> - curtin: Installation started. (0.1.0~bzr505-0ubuntu1~16.04.1)
> - third party drivers not installed or necessary.
> + Ends in:
> [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
> curtin: Installation failed with exception: Unexpected error while
> running command.
> - Command: ['curtin', 'block-meta', 'custom']
> - Exit code: 3
> - Reason: -
> - Stdout: [Errno 13] Permission denied: '/sys/class/block/md127/md/
> sync_action'
>

This is bug https://bugs.launchpad.net/bugs/1708052

Which is fixed in trunk.

Joshua Powers (powersj) wrote :

Two changes were made to the system:

1) added curtin-dev/daily ppa, upgraded curtin

2) Updated the maas_url in /etc/maas/rackd.conf per roaksoax to include the port 5240

Restarted maas-rackd and recommissioned successfully.

ChristianEhrhardt (paelzer) wrote :

With those two changes in place things "work".
- you already commissioned - ok
- I was able to install wichita - ok

I put "work" in quotes as maas officially considers it failed deployment.
Maybe some timeouts make it think so, but it worked and I can use the system.

Were there any past configs we lost to kind of be a bit patient with it?
@Maas team any recommendations?

... attaching logs

ChristianEhrhardt (paelzer) wrote :

roflcopter :-)

I debugged two maas issues at once: one on arm and one on power8.
I just looked at the wrong one :-)
Remove the quotes - this one here just works.

Eventually we have to wait until the new curtin gets released, but that is already tracked.
Marking this invalid.

Thanks everybody involved.

Changed in maas:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers