Commissioning with a Saucy image sets node status to "Failed tests"

Bug #1237364 reported by Raphaël Badin on 2013-10-09
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
MAAS
Critical
Andres Rodriguez
maas (Ubuntu)
Critical
Unassigned

Bug Description

I commissioned an amd64 node with a Precise commissioning image without problem. When trying to commission the same node using a Saucy image, the node failed commissioning and was marked "Failed tests" (the node's page reported: "failed [3/5] ( 00-maas-01-lshw 00-maas-02-virtuality 99-maas-02-capture-lldp)").

This is using the lab so should be easy to reproduce/investigate.

Related branches

Changed in maas:
importance: High → Critical
Raphaël Badin (rvb) wrote :

I can confirm that this is only happening with Saucy. I commissioned the same node successfully using Precise, Quantal and Raring.

Raphaël Badin (rvb) on 2013-10-09
summary: - Commissioning with a Saucy image failed.
+ Commissioning with a Saucy image sets node status to "Failed tests"
Changed in maas:
assignee: nobody → Andres Rodriguez (andreserl)
status: Triaged → In Progress
Changed in maas (Ubuntu):
status: New → Confirmed
importance: Undecided → Critical
Raphaël Badin (rvb) on 2013-10-10
Changed in maas:
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package maas - 1.4+bzr1693+dfsg-0ubuntu1

---------------
maas (1.4+bzr1693+dfsg-0ubuntu1) saucy; urgency=low

  * New Upstream Release (LP: #1218526)
    - This new upstream release contains fixes and improvements of the
      features approved by the FFe above.
    - Fixes commissioning failure on Saucy with 'Failed Test' (LP: #1237364)
    - Fixes access of static images over http (LP: #1236544)
  * d/maas-cluster-controller.postinst: a2enmod version module (LP: #1236544)
  * d/control: Bump depends on python-django to 1.4. (LP: #1236572)
  * d/maas-dhcp.postinst: Fail gracefully if apparmor_parser fails, allowing
    to install maas-dhcp during an ISO install. (LP: #1236786)
 -- Andres Rodriguez <email address hidden> Fri, 04 Oct 2013 12:33:05 -0400

Changed in maas (Ubuntu):
status: Confirmed → Fix Released
Changed in maas:
status: Fix Committed → Fix Released
Changed in maas:
milestone: none → 13.10
status: Fix Released → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
Shang Wu (shangwu) wrote :
Download full text (33.0 KiB)

We ran into this issue on 14.04.
maas version: 1.4+bzr1820+dfsg-0ubuntu1
deploy 12.04.3 images

Error output
failed [2/5] ( 00-maas-01-lshw 00-maas-02-virtuality)

<list xmlns:lldp="lldp" xmlns:lshw="lshw">
  <lshw:node id="b7t6c" claimed="true" class="system" handle="DMI:0001">
 <lshw:description>Desktop Computer</lshw:description>
 <lshw:product>(To be filled by O.E.M.)</lshw:product>
 <lshw:width units="bits">64</lshw:width>
 <lshw:configuration>
  <lshw:setting id="boot" value="normal"/>
  <lshw:setting id="chassis" value="desktop"/>
  <lshw:setting id="family" value="To be filled by O.E.M."/>
  <lshw:setting id="sku" value="To be filled by O.E.M."/>
  <lshw:setting id="uuid" value="007B4B0B-D534-E111-832F-ECA86BFE1858"/>
 </lshw:configuration>
 <lshw:capabilities>
  <lshw:capability id="smbios-2.7">SMBIOS version 2.7</lshw:capability>
  <lshw:capability id="dmi-2.7">DMI version 2.7</lshw:capability>
  <lshw:capability id="vsyscall32">32-bit processes</lshw:capability>
 </lshw:capabilities>
  <lshw:node id="core" claimed="true" class="bus" handle="DMI:0002">
   <lshw:description>Motherboard</lshw:description>
   <lshw:product>D53427RKE</lshw:product>
   <lshw:vendor>Intel Corporation</lshw:vendor>
   <lshw:physid>0</lshw:physid>
   <lshw:version>G87790-403</lshw:version>
   <lshw:serial>GERK34400GZZ</lshw:serial>
   <lshw:slot>To be filled by O.E.M.</lshw:slot>
    <lshw:node id="firmware" claimed="true" class="memory" handle="">
     <lshw:description>BIOS</lshw:description>
     <lshw:vendor>Intel Corp.</lshw:vendor>
     <lshw:physid>0</lshw:physid>
     <lshw:version>RKPPT10H.86A.0017.2013.0425.1251</lshw:version>
     <lshw:date>04/25/2013</lshw:date>
     <lshw:size units="bytes">65536</lshw:size>
     <lshw:capacity units="bytes">16711680</lshw:capacity>
     <lshw:capabilities>
      <lshw:capability id="pci">PCI bus</lshw:capability>
      <lshw:capability id="upgrade">BIOS EEPROM can be upgraded</lshw:capability>
      <lshw:capability id="shadowing">BIOS shadowing</lshw:capability>
      <lshw:capability id="cdboot">Booting from CD-ROM/DVD</lshw:capability>
      <lshw:capability id="bootselect">Selectable boot path</lshw:capability>
      <lshw:capability id="edd">Enhanced Disk Drive extensions</lshw:capability>
      <lshw:capability id="int13floppy1200">5.25" 1.2MB floppy</lshw:capability>
      <lshw:capability id="int13floppy720">3.5" 720KB floppy</lshw:capability>
      <lshw:capability id="int13floppy2880">3.5" 2.88MB floppy</lshw:capability>
      <lshw:capability id="int5printscreen">Print Screen key</lshw:capability>
      <lshw:capability id="int14serial">INT14 serial line control</lshw:capability>
      <lshw:capability id="int17printer">INT17 printer control</lshw:capability>
      <lshw:capability id="acpi">ACPI</lshw:capability>
      <lshw:capability id="usb">USB legacy emulation</lshw:capability>
      <lshw:capability id="biosbootspecification">BIOS boot specification</lshw:capability>
     </lshw:capabilities>
    </lshw:node>
    <lshw:node id="cache:0" claimed="true" class="memory" handle="DMI:003D">
     <lshw:description>L2 cache</lshw:description>
     <lshw:physid>3d</lshw:physid>
     <lshw:slot>CPU I...

Shang Wu (shangwu) wrote :

The reason for the error is because node couldn't get to the Internet. Issue fixed by configuring the proxy to the MAAS server.

Paul (pachen2) wrote :

Ran into similar same Error output and the node was connected to internet:
failed [2/5] ( 00-maas-01-lshw 00-maas-02-virtuality)

Some information on my configuration: The DHCP server is on a different box and able to provide the correct ip, route, netmask, and gateway to node.

Used the workaround " Debugging ephemeral image" and able to create user/pass to stopped it from shutdown the node machine. Once inside the node, I was able to ping out to the internet.

Attached is my lshw log.

Thank you very much in advance!

Paul (pachen2) wrote :

Sorry, left out the important stuff:

Host MAAS version: 1.4+bzr1693+dfsg-0ubuntu2.2
Host OS version: ubuntu server 13.10
Host Kernel: 3.11.0-12-generic #19-Ubuntu

Node OS version: 12.04
Node Kernel version: 3.2.0-54-generic #82

Thanks again!

Paul,

Can you also please attach the output of /var/log/cloud-init.log and/or
cloud-init-output.log from the epehemeral image?
On Feb 7, 2014 7:25 PM, "Paul" <email address hidden> wrote:

> Sorry, left out the important stuff:
>
> Host MAAS version: 1.4+bzr1693+dfsg-0ubuntu2.2
> Host OS version: ubuntu server 13.10
> Host Kernel: 3.11.0-12-generic #19-Ubuntu
>
> Node OS version: 12.04
> Node Kernel version: 3.2.0-54-generic #82
>
> Thanks again!
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1237364
>
> Title:
> Commissioning with a Saucy image sets node status to "Failed tests"
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1237364/+subscriptions
>

Paul (pachen2) wrote :

Hi Andrew,

I could only find cloud-init.log under /var/log.

Please see attachment.

Thank you!

Raphaël Badin (rvb) wrote :

Btw, I just found (and fixed) bug 1278895. It means that the list of the failed scripts in the "failed [3/5] …" message is actually wrong: it's the list of the scripts that didn't fail (!).

Here is the list of all the scripts that maas has:
00-maas-01-lshw
00-maas-02-virtuality
00-maas-03-install-lldpd
99-maas-01-wait-for-lldpd
99-maas-02-capture-lldp

When MAAS says "failed [2/5] 00-maas-01-lshw 00-maas-02-virtuality", it means '00-maas-03-install-lldpd', '99-maas-01-wait-for-lldpd' and '99-maas-02-capture-lldp' failed.

This is a nasty bug (!). It's fixed as of revision 1926.

Raphaël Badin (rvb) wrote :

Another note: to debug commissioning problems, the quickest way is to have a look at the output of the scripts that failed. You can't do that with the UI right now but it's all available if you use the API/CLI:

The following CLI command :
$ maas-cli maas commissioning-results list
will spit out all the result of the commissioning scripts. The 'data', the actual output of the script, is base64-encoded.

Andres Rodriguez (andreserl) wrote :

Paul,

Would it be possible for you to collect console output?

On the other hand, what you could try is to obtain the output of the scripts themselves from the database. These are in base64. Use thefollowing command:

maas-cli <username> commissioning-results list

The results yields json output from each fo the commissioning scripts. In the field 'data' you will find the base64 encoded output. Please do share that information so we can see why the script failed.

Cheers.

Paul (pachen2) wrote :

Awesome, thank you Andres and Raphaël!!

Please see attachment.

Note to myself:
1. Login to the API by using the following steps: https://maas.ubuntu.com/docs/maascli.html
2. Then run command: maas-cli maas commissioning-results list > commissioing-results.json

Best regards,

Jesus (vengahastaluego16) wrote :

I have a similar error, when I comission a node, I get this error:

Error output
failed [2/5] ( 00-maas-01-lshw 00-maas-02-virtuality)

I'm using Ubuntu Server 12.04.3 and Precise image for the nodes.

Thanks.

Raphaël Badin (rvb) wrote :

@Paul: the problem you're having is that your nodes can't access the outside world. They are thus unable to fetch the packages needed for commissioning (see http://paste.ubuntu.com/7195406/).

Unless you changed the http_proxy config option (on the settings page — or you can configure it using the CLI/API), your nodes are using the region's proxy to download packages. Have a look at the proxy's log to check if you can see something suspicious.

Another common problem is when MAAS' DNS server (BIND server installed on the region) is unable to resolve names such as 'ubuntu.com'. If this is the problem then the region's /var/log/syslog file will contain errors like 'error (network unreachable) resolving …'. To solve this: if you're using Trusty's package, just configure the 'upstream_dns' option ('settings' page, at the bottom — or using the API/CLI) with your upstream DNS server address. If you're using another version, you have to do the config manually: include something like http://paste.ubuntu.com/7195395/ in /etc/bind/named.conf.options and restart the maas-dns service.

Rajasekar Karthik (karthik085) wrote :

I am having same problems but with Ubuntu 14.04 LTS -

I have a Ubuntu MASS Server running on one node.
It has two NICs. One NIC (eth1) is connected to our organization's network. The other NIC (eth0) is private network - to manage all the nodes in the cloud. I am running DHCP server on eth0.

I am trying to add another node. It can PXE boot, get IP address, but cannot connect to external network and get packages. Can you provide details on how to resolve this? I referred http://ideasnet.wordpress.com/category/ubuntu-14-04lts-cloud-computing-server-edition/

which helped in other cases, but not with this problem.

Thanks!

This is not a MAAS problem, you need to set up IP forwarding in your maas server.

Rajasekar Karthik (karthik085) wrote :

Hmm...I realize this is not a problem. I posted it in AskUbuntu and have not heard anything. As this bug seemed very similar to what I was experiencing, I was wondering if one of the persons who solved or know how to solve could write a quick comment on how to do so...

IP forwarding....DNS forwarding you mean??? DHCP does work, but not DNS!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers