Nodes don't go to ready, after commissioning they get a 500 error when reporting back to maas

Bug #1131418 reported by Ramon Acedo
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Raphaël Badin
1.2
Fix Released
Critical
Raphaël Badin
maas (Ubuntu)
Fix Released
Critical
Andres Rodriguez
Raring
Fix Released
Undecided
Unassigned

Bug Description

maas version installed: 1.2+bzr1359+dfsg-0ubuntu1~ppa1 on Ubuntu Precise 12.04.2

We enlist a node, the accept and commission it and the node stays in Commissioning state.

We see that after cloud-init the node tries to access maas and gets an internal server error.

In the maas.log file we see this:

ERROR 2013-02-21 16:11:46,283 maas.maasserver ################################ Exception: Invalid expression ################################
ERROR 2013-02-21 16:11:46,283 maas.maasserver Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/django/core/handlers/base.py", line 111, in get_response
    response = callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib/python2.7/dist-packages/django/views/decorators/vary.py", line 22, in inner_func
    response = func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/piston/resource.py", line 166, in __call__
    result = self.error_handler(e, request, meth, em_format)
  File "/usr/lib/python2.7/dist-packages/piston/resource.py", line 164, in __call__
    result = meth(request, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/api.py", line 308, in dispatch
    return function(self, request, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/metadataserver/api.py", line 230, in signal
    self._store_commissioning_results(node, request)
  File "/usr/lib/python2.7/dist-packages/metadataserver/api.py", line 191, in _store_commissioning_results
    node.set_hardware_details(raw_content)
  File "/usr/lib/python2.7/dist-packages/maasserver/models/node.py", line 798, in set_hardware_details
    update_hardware_details(self, xmlbytes, Tag.objects)
  File "/usr/lib/python2.7/dist-packages/maasserver/models/node.py", line 398, in update_hardware_details
    has_tag = evaluator(tag.definition)
  File "xpath.pxi", line 321, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:117734)
  File "xpath.pxi", line 239, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:116911)
  File "xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:116780)
XPathEvalError: Invalid expression

Maybe related, on juju we set a maas-name constraint to install each service to a particular. node (that worked) and then constraints weren't recognized at some point (we destroyed the juju environment and started again).

[Impact]
MAAS machines in the commissioning process fail to successfully commission and be placed into Ready state. This is caused when the user defines tag with no definition. These tag with no definition simply server for node identification purposes. Without this fix, machine wont be able to commission successfully if the user has previously defined a 'no-definition' tag.

[Test Case]
To reproduce do the following:

1. Install MAAS
2. Enlist machines
3. Define a tag with no definition
4. Try to commission machines,
These will fail to commission, or if they do, their hardware requirements wont be appear and juju wont be able to deploy these machines.
5. Try the proposed fix.
Machines would be able to commission successfully.

[Regression Potential]
Minimal, this actually fixes a regression. This has been tested both manually and in automated labs.

Related branches

Changed in maas:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Raphaël Badin (rvb) wrote :

Something seems to be wrong with the tag, can you please give of the value of the tag's 'definition'?

Here is how to print the definitions for all the tags:
# start python shell
sudo maas shell
# in python shell:
>>> from maasserver.models import Tag
>>> [tag.definition for tag in Tag.objects.all()]

Revision history for this message
Raphaël Badin (rvb) wrote :

Actually, looking at the code, the tag definition should be a valid expression otherwise you would have gotten an error when the tag was created. That's why I now suspect that you have non-valid hardware info (created by lshw).

Here is out to print the hardware info recorded by lshw:
# start python shell
sudo maas shell
# in python shell:
>>> from maasserver.models import Node
>>> [node.hardware_details for node in Node.objects.all()]

Revision history for this message
Ramon Acedo (ramon-linux-labs) wrote :

I captured the tcpdump and extracted the xml posted through http by the node to the maas server that caused the 500. Find it attached here.

I can provide you the full capture if needed.

Revision history for this message
Raphaël Badin (rvb) wrote :

That xml seems perfectly valid, could you please give me the definitions of the tags that you have created on your MAAS instance (see my comment #1)? (it seems that the problem stems from a bad interaction between the hardware information and a tag definition).

Revision history for this message
Ramon Acedo (ramon-linux-labs) wrote :

Raphaël, unfortunately we cannot reproduce this any more and as we were in the middle of a delivery we had to rebuild MAAS.

We didn't use tags, we used constraints (not sure if that goes to the Tag django model too?) and the only constraint we use was "maas-name=node-name" before juju deploy and before juju add-unit and then we cleared it by setting the constraint "maas-name=".

Revision history for this message
Raphaël Badin (rvb) wrote :

No, if you use the 'maas-name' constraint, that won't use tags. That's weird because the stacktrace you've pasted indicates that something blew up when the (non-empty) list of tags was being iterated.
I'm going to mark this bug incomplete, please re-open if you can recreate it.

Changed in maas:
status: Triaged → Incomplete
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Raphael,

I'm experiencing this issue with Raring!

Revision history for this message
Andres Rodriguez (andreserl) wrote :

>>> from maasserver.models import Tag
>>> [tag.definition for tag in Tag.objects.all()]
[u'']
>>> from maasserver.models import Node
>>> [node.hardware_details for node in Node.objects.all()]
[None, None, None]

Revision history for this message
Raphaël Badin (rvb) wrote :

All right, thanks for the details… I'm investigating this…

Changed in maas:
status: Incomplete → Triaged
Changed in maas (Ubuntu):
status: New → Confirmed
importance: Undecided → Critical
assignee: nobody → Andres Rodriguez (andreserl)
description: updated
Raphaël Badin (rvb)
Changed in maas:
assignee: nobody → Raphaël Badin (rvb)
status: Triaged → Fix Committed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hi, this is missing Regression Potential. Please see https://wiki.ubuntu.com/StableReleaseUpdates#Procedure for more information.

description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package maas - 1.3+bzr1461+dfsg-0ubuntu3

---------------
maas (1.3+bzr1461+dfsg-0ubuntu3) saucy; urgency=low

  * debian/patches:
    - 99-fix-ipmi-stat-lp1086160.patch: Drop. The following patch removes
      the need for this fix. (LP: #1171988)
    - 99-fix-ipmi-lp1171418.patch: Do not check current node state when
      executing an ipmi command, which ensures that nodes are always
      turned on/off regardless of their power state. This fixes corner
      cases found when running automated tests. (LP: #1171418)
    - 99-fix-comissioning-lp1131418.patch: Fixes the commissioning process,
      allowing nodes to successfully commission, when tag's with no
      definition have been created. This issue will only appear when these
      special tags are created. (LP: #1131418)
    - 99-import-raring-images-lp1182642.patch: Enables the import of raring
      images by default (LP: #1182642)
    - 99-fix-new-image-install-lp1182646.patch: Fixes the installation of
      new ephemeral images, that fail due to not being able to overwrite
      a symlink. (LP: #1182646)
 -- Andres Rodriguez <email address hidden> Tue, 23 Apr 2013 14:02:33 -0400

Changed in maas (Ubuntu):
status: Confirmed → Fix Released
Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello Ramon, or anyone else affected,

Accepted maas into raring-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/maas/1.3+bzr1461+dfsg-0ubuntu2.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in maas (Ubuntu Raring):
status: New → Fix Committed
tags: added: verification-needed
Revision history for this message
Andres Rodriguez (andreserl) wrote :

I have tested this and it works as expected.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package maas - 1.3+bzr1461+dfsg-0ubuntu2.1

---------------
maas (1.3+bzr1461+dfsg-0ubuntu2.1) raring-proposed; urgency=low

  * debian/patches:
    - 99-fix-ipmi-stat-lp1086160.patch: Drop. The following patch removes
      the need for this fix. (LP: #1171988)
    - 99-fix-ipmi-lp1171418.patch: Do not check current node state when
      executing an ipmi command, which ensures that nodes are always
      turned on/off regardless of their power state. This fixes corner
      cases found when running automated tests. (LP: #1171418)
    - 99-fix-comissioning-lp1131418.patch: Fixes the commissioning process,
      allowing nodes to successfully commission, when tag's with no
      definition have been created. This issue will only appear when these
      special tags are created. (LP: #1131418)
    - 99-import-raring-images-lp1182642.patch: Enables the import of raring
      images by default (LP: #1182642)
    - 99-fix-new-image-install-lp1182646.patch: Fixes the installation of
      new ephemeral images, that fail due to not being able to overwrite
      a symlink. (LP: #1182646)
 -- Andres Rodriguez <email address hidden> Tue, 23 Apr 2013 14:02:33 -0400

Changed in maas (Ubuntu Raring):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.