Can't enlist/commission with large number of bcache devices

Bug #1933793 reported by Andrey Grebennikov
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Expired
Undecided
Unassigned
cloud-init
Expired
Undecided
Unassigned

Bug Description

MAAS 2.9 or 3.0, commissioning image 18.04 or 20.04

Registered a number of servers with 24 drives, created 20 bcache devices.
Provisioned servers and then released them.
Now I can only enlist or commission them one time of of 10 at max.
The server start booting from PXE, gets the address, downloads kernel and initramfs, starts mounting rootfs (remote) but going through the list of bcache devices takes too long so the server can't reach the point of activating network adapters so the rootfs mount fails and the server falls into rootfs console.

This bug is somewhat similar to the one forcing to use wipe-bcache commissioning script from FE, but it is not applicable here as commissioning can't complete either.

affects: maas → curtin
affects: curtin → maas
Revision history for this message
Alberto Donato (ack) wrote :

Could you please attach curtin and cloud-init logs and the rsyslog for one of the deployed servers?

Changed in maas:
status: New → Incomplete
Revision history for this message
Andrey Grebennikov (agrebennikov) wrote :

The problem is - this issue isn't visible form anywhere but the remote graphic console while watching the boot. The server doesn't activate the network so nothing is sent to the rsyslog.
All I have is a video recording of the boot process.

James Falcon (falcojr)
Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Andrey Grebennikov (agrebennikov) wrote :
Revision history for this message
Andrey Grebennikov (agrebennikov) wrote :
Revision history for this message
Andrey Grebennikov (agrebennikov) wrote :

@Alberto are you maintaining custom initramfs within maas image? If that's the case I'd still consider it to be MAAS bug.

Revision history for this message
Alberto Donato (ack) wrote :

@Andrey, maas doesn't build custom initramfs. They're taken from official Ubuntu images and published in MAAS streams.

Revision history for this message
Chad Smith (chad.smith) wrote :

Not sure if it helps but curtin/cloud-init logs could be captured following some of the steps documented here https://discourse.maas.io/t/getting-curtin-debug-logs/169

Revision history for this message
Andrey Grebennikov (agrebennikov) wrote :

@chad the point is - the node doesn't get to the point when it can execute anything like cloud-init, curtin etc. It stops before the network is activated hence doesn't communicate with maas at all. Only access to the node during this process is available via the graphical console.

Revision history for this message
Steve Langasek (vorlon) wrote :

The actual failure that I see in the video of the boot sequence is that the initramfs doesn't have a route to the server. That doesn't seem like an initramfs bug?

Revision history for this message
Andrey Grebennikov (agrebennikov) wrote :

@Steve if I go ahead and wipe the first 100M of each individual block device while it is in the initramfs mode (to remove all the bcache info) and restart - it boots perfectly. The reason you see that it doesn't have a route to the server is because it didn't get to the point of activating the networking.
In fact, Sometimes I get lucky and the action of networking activation is happening (1 out of 10 attempts) and then I have the server booted.
Have a look at the "successful" attempt uploaded.

Revision history for this message
Andrey Grebennikov (agrebennikov) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for cloud-init because there has been no activity for 60 days.]

Changed in cloud-init:
status: Incomplete → Expired
Revision history for this message
Andrey Grebennikov (agrebennikov) wrote :

reactivating

Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.