Libvirt machines deployed with MAAS and Precise is only able to successfully boot once

Bug #1297899 reported by Dan Poler
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Wayne Witzel III

Bug Description

NOTE: I don't know if this is a Juju bug or a MAAS bug but I'm opening it as a Juju bug because I cannot reproduce it without using Juju, and someone else has reported similar .

* MAAS version: 1.5+bzr1909-0ubuntu1 on Trusty
* Juju version: 1.17.2-0ubuntu2

* MAAS environment running in Libvirt. The MAAS server is a VM (1 CPU, 2 GB RAM).
* Six additional VM's commissioned in MAAS, named node1-node6. node1 is 1 CPU, 2GB RAM and used for the Juju bootstrap node. node2-node6 are 1 CPU, 1 GB RAM. Using the libvirt power stuff, all configured and working fine.
* Bootstrap Juju with juju bootstrap --upload-tools --constraints="mem=2G" to ensure that it bootstraps to node1, the node with 2 GB RAM. This deploys on Precise using the Fastpath installer.
* Reset constraints due to only having one 2 GB node and I just used it as the bootstrap node. juju set-constraints "mem=512M"
* Bootstrap succeeds, near as I can tell -- the node reboots and comes up, juju status shows the agent started, etc.
* Deploy another node, e.g. "juju deploy ubuntu" or "juju deploy mysql" or "juju deploy foo" -- doesn't really matter what you deploy.
* For any system part of this environment including the juju node 0, let's suppose a node needs to be rebooted. juju ssh 1, sudo reboot.

For the subsequent reboot AFTER the one during provisioning, any system will not be able to successfully reboot.
* GRUB comes up, kernel loads, and hangs with "The disk drive for / is not yet ready or not present. Continue to wait" etc.
* cloud-init-nonet waiting 120 seconds for a network device.
* (Wait 120 seconds)
* Eventually continues, eth0 does not get an address. ci-info shows !!!!!Route info failed!!!!! and hangs again for a while
* Gets to "Waiting for network configuration" and "Waiting up to 60 more seconds for network configuration" and hangs again for a bit
* The node eventually gets to a login prompt, but networking has not come up, so can't get in.

cloud-init logs and screenshots of booting node will be attached. This happens to any node juju provisions and is easily reproducible. It does NOT happen to a node just commissioned and started via MAAS GUI.

Revision history for this message
Dan Poler (l-dan) wrote :

cloud-init.log

Revision history for this message
Dan Poler (l-dan) wrote :

cloud-init-output.log

Revision history for this message
Dan Poler (l-dan) wrote :

Screenshots of console during failed boot

Revision history for this message
Dan Poler (l-dan) wrote :

Just to clarify the statement "For the subsequent reboot AFTER the one during provisioning, any system will not be able to successfully reboot." -- "any system" could be any system deployed via a charm, OR the juju bootstrap node (aka node 0).

Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
tags: added: maas-provider
Changed in juju-core:
milestone: none → 1.18.0
Changed in juju-core:
assignee: nobody → Wayne Witzel III (wwitzel3)
Revision history for this message
Wayne Witzel III (wwitzel3) wrote :

I was able to easily replicate the error with 1.17.2 with consistency. I am unable to replicate it with current trunk of juju-core. So it looks like it is fixed there.

Changed in juju-core:
status: Triaged → Confirmed
Revision history for this message
Wayne Witzel III (wwitzel3) wrote :

Out of curiosity I just tested with the latest release, 1.17.6 and I am unable to reproduce the error there as well.

Changed in juju-core:
status: Confirmed → Fix Released
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.18.0 → 1.17.7
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.