Trying to commission a machine results into booting to the enlistment environment

Bug #1555570 reported by Michael Foord
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
New
Critical
Unassigned

Bug Description

I'm using MAAS 2.0 on xenial with kvm images (controller and nodes are KVM images). I can enlist but not commission nodes.

Commissioning fails with a 400: Bad request showing on the node.

Note that rack-controller configuration documentation mentions using the "maas-rack" command to install a shared secret. This command appears not to exist.

https://maas.ubuntu.com/docs2.0/rack-configuration.html

I did "dpkg-reconfigure" on the maas-rack-controller which appears to have got localhost for the API address after a vanilla install. The reconfigure screen says that localhost shouldn't be used so I changed this to the correct address. After this the "settings" page of the Web UI says that the rack controller is not connected to the region (the controller page of the UI doesn't show any problems).

I see the following errors in regiond.log (nothing in maas.log

2016-03-10 11:17:55 [-] Rack controller (4y3h7n) did not import boot resources; it is not connected to the region at this time.
2016-03-10 11:17:55 [RegionServer,0,127.0.0.1] Rack controller authenticated from '127.0.0.1:38828'.
2016-03-10 11:17:55 [-] Rack controller '4y3h7n' has been registered.
2016-03-10 11:17:56 [-] Failed to register rack controller 'None' into the database. Connection has been dropped.
 Traceback (most recent call last):
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1184, in gotResult
     _inlineCallbacks(r, g, deferred)
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1126, in _inlineCallbacks
     result = result.throwExceptionIntoGenerator(g)
   File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
     return g.throw(self.type, self.value, self.tb)
   File "/usr/lib/python3/dist-packages/maasserver/rpc/regionservice.py", line 523, in register
     log.err(exc, msg)
 --- <exception caught here> ---
   File "/usr/lib/python3/dist-packages/maasserver/rpc/regionservice.py", line 481, in register
     nodegroup_uuid=nodegroup_uuid)
   File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 246, in inContext
     result = inContext.theWork()
   File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 262, in <lambda>
     inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
   File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 118, in callWithContext
     return self.currentContext().callWithContext(ctx, func, *args, **kw)
   File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 81, in callWithContext
     return func(*args,**kw)
   File "/usr/lib/python3/dist-packages/provisioningserver/utils/twisted.py", line 197, in wrapper
     return func(*args, **kwargs)
   File "/usr/lib/python3/dist-packages/maasserver/utils/orm.py", line 448, in call_within_transaction
     return func_outside_txn(*args, **kwargs)
   File "/usr/lib/python3/dist-packages/maasserver/utils/orm.py", line 275, in retrier
     return func(*args, **kwargs)
   File "/usr/lib/python3.5/contextlib.py", line 30, in inner
     return func(*args, **kwds)
   File "/usr/lib/python3/dist-packages/maasserver/rpc/rackcontrollers.py", line 56, in register_rackcontroller
     rackcontroller.update_interfaces(interfaces) # Calls save.
   File "/usr/lib/python3/dist-packages/maasserver/models/node.py", line 2932, in update_interfaces
     interface = self._update_interface(name, interfaces[name])
   File "/usr/lib/python3/dist-packages/maasserver/models/node.py", line 2960, in _update_interface
     return self._update_physical_interface(name, config)
   File "/usr/lib/python3/dist-packages/maasserver/models/node.py", line 3011, in _update_physical_interface
     interface.save()
   File "/usr/lib/python3/dist-packages/maasserver/models/cleansave.py", line 29, in save
     return super(CleanSave, self).save(*args, **kwargs)
   File "/usr/lib/python3/dist-packages/maasserver/models/timestampedmodel.py", line 64, in save
     return super(TimestampedModel, self).save(*args, **kwargs)
   File "/usr/lib/python3/dist-packages/django/db/models/base.py", line 734, in save
     force_update=force_update, update_fields=update_fields)
   File "/usr/lib/python3/dist-packages/django/db/models/base.py", line 762, in save_base
     updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
   File "/usr/lib/python3/dist-packages/django/db/models/base.py", line 827, in _save_table
     forced_update)
   File "/usr/lib/python3/dist-packages/django/db/models/base.py", line 877, in _do_update
     return filtered._update(values) > 0
   File "/usr/lib/python3/dist-packages/django/db/models/query.py", line 580, in _update
     return query.get_compiler(self.db).execute_sql(CURSOR)
   File "/usr/lib/python3/dist-packages/django/db/models/sql/compiler.py", line 1062, in execute_sql
     cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
   File "/usr/lib/python3/dist-packages/django/db/models/sql/compiler.py", line 840, in execute_sql
     cursor.execute(sql, params)
   File "/usr/lib/python3/dist-packages/django/db/backends/utils.py", line 64, in execute
     return self.cursor.execute(sql, params)
   File "/usr/lib/python3/dist-packages/django/db/utils.py", line 98, in __exit__
     six.reraise(dj_exc_type, dj_exc_value, traceback)
   File "/usr/lib/python3/dist-packages/django/utils/six.py", line 685, in reraise
     raise value.with_traceback(tb)
   File "/usr/lib/python3/dist-packages/django/db/backends/utils.py", line 64, in execute
     return self.cursor.execute(sql, params)
 django.db.utils.OperationalError: deadlock detected
 DETAIL: Process 30837 waits for ShareLock on transaction 1368; blocked by process 30790.
 Process 30790 waits for ShareLock on transaction 1367; blocked by process 30837.
 HINT: See server log for query details.
 CONTEXT: while updating tuple (0,29) in relation "maasserver_interface"

2016-03-10 11:17:56 [-] Failed to register rack controller 'None' into the database. Connection has been dropped.
 Traceback (most recent call last):
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1184, in gotResult
     _inlineCallbacks(r, g, deferred)
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1126, in _inlineCallbacks
     result = result.throwExceptionIntoGenerator(g)
   File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
     return g.throw(self.type, self.value, self.tb)
   File "/usr/lib/python3/dist-packages/maasserver/rpc/regionservice.py", line 523, in register
     log.err(exc, msg)
 --- <exception caught here> ---
   File "/usr/lib/python3/dist-packages/maasserver/rpc/regionservice.py", line 481, in register
     nodegroup_uuid=nodegroup_uuid)
   File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 246, in inContext
     result = inContext.theWork()
   File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 262, in <lambda>
     inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
   File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 118, in callWithContext
     return self.currentContext().callWithContext(ctx, func, *args, **kw)
   File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 81, in callWithContext
     return func(*args,**kw)
   File "/usr/lib/python3/dist-packages/provisioningserver/utils/twisted.py", line 197, in wrapper
     return func(*args, **kwargs)
   File "/usr/lib/python3/dist-packages/maasserver/utils/orm.py", line 448, in call_within_transaction
     return func_outside_txn(*args, **kwargs)
   File "/usr/lib/python3/dist-packages/maasserver/utils/orm.py", line 275, in retrier
     return func(*args, **kwargs)
   File "/usr/lib/python3.5/contextlib.py", line 30, in inner
     return func(*args, **kwds)
   File "/usr/lib/python3/dist-packages/maasserver/rpc/rackcontrollers.py", line 56, in register_rackcontroller
     rackcontroller.update_interfaces(interfaces) # Calls save.
   File "/usr/lib/python3/dist-packages/maasserver/models/node.py", line 2932, in update_interfaces
     interface = self._update_interface(name, interfaces[name])
   File "/usr/lib/python3/dist-packages/maasserver/models/node.py", line 2960, in _update_interface
     return self._update_physical_interface(name, config)
   File "/usr/lib/python3/dist-packages/maasserver/models/node.py", line 3011, in _update_physical_interface
     interface.save()
   File "/usr/lib/python3/dist-packages/maasserver/models/cleansave.py", line 29, in save
     return super(CleanSave, self).save(*args, **kwargs)
   File "/usr/lib/python3/dist-packages/maasserver/models/timestampedmodel.py", line 64, in save
     return super(TimestampedModel, self).save(*args, **kwargs)
   File "/usr/lib/python3/dist-packages/django/db/models/base.py", line 734, in save
     force_update=force_update, update_fields=update_fields)
   File "/usr/lib/python3/dist-packages/django/db/models/base.py", line 762, in save_base
     updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
   File "/usr/lib/python3/dist-packages/django/db/models/base.py", line 827, in _save_table
     forced_update)
   File "/usr/lib/python3/dist-packages/django/db/models/base.py", line 877, in _do_update
     return filtered._update(values) > 0
   File "/usr/lib/python3/dist-packages/django/db/models/query.py", line 580, in _update
     return query.get_compiler(self.db).execute_sql(CURSOR)
   File "/usr/lib/python3/dist-packages/django/db/models/sql/compiler.py", line 1062, in execute_sql
     cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
   File "/usr/lib/python3/dist-packages/django/db/models/sql/compiler.py", line 840, in execute_sql
     cursor.execute(sql, params)
   File "/usr/lib/python3/dist-packages/django/db/backends/utils.py", line 64, in execute
     return self.cursor.execute(sql, params)
   File "/usr/lib/python3/dist-packages/django/db/utils.py", line 98, in __exit__
     six.reraise(dj_exc_type, dj_exc_value, traceback)
   File "/usr/lib/python3/dist-packages/django/utils/six.py", line 685, in reraise
     raise value.with_traceback(tb)
   File "/usr/lib/python3/dist-packages/django/db/backends/utils.py", line 64, in execute
     return self.cursor.execute(sql, params)
 django.db.utils.OperationalError: deadlock detected
 DETAIL: Process 30862 waits for ShareLock on transaction 1367; blocked by process 30837.
 Process 30837 waits for ShareLock on transaction 1368; blocked by process 30790.
 Process 30790 waits for ExclusiveLock on tuple (0,31) of relation 16789 of database 16385; blocked by process 30862.
 HINT: See server log for query details.
 CONTEXT: while updating tuple (0,31) in relation "maasserver_interface"

Plus the following in rackd.log

2016-03-10 11:17:55+0000 [-] Log opened.
2016-03-10 11:17:55+0000 [-] twistd 16.0.0pre1 (/usr/bin/python3 3.5.1) starting up.
2016-03-10 11:17:55+0000 [-] reactor class: twisted.internet.epollreactor.EPollReactor.
2016-03-10 11:17:55+0000 [-] Site starting on 5248
2016-03-10 11:17:55+0000 [-] <class 'provisioningserver.pserv_services.lease_socket_service.LeaseSocketService'> starting on '/var/lib/maas/dhcpd.sock'
2016-03-10 11:17:55+0000 [-] Starting protocol <provisioningserver.pserv_services.lease_socket_service.LeaseSocketService object at 0x7f94e7e66358>
2016-03-10 11:17:55+0000 [-] TFTP starting on 69
2016-03-10 11:17:55+0000 [-] Starting protocol <tftp.protocol.TFTP object at 0x7f94e7e38ac8>
2016-03-10 11:17:55+0000 [-] TFTP Listener started at 10.0.3.1:69
2016-03-10 11:17:55+0000 [-] TFTP starting on 69
2016-03-10 11:17:55+0000 [-] Starting protocol <tftp.protocol.TFTP object at 0x7f94e7e38d68>
2016-03-10 11:17:55+0000 [-] TFTP Listener started at 172.16.0.2:69
2016-03-10 11:17:55+0000 [-] TFTP starting on 69
2016-03-10 11:17:55+0000 [-] Starting protocol <tftp.protocol.TFTP object at 0x7f94e7e38ef0>
2016-03-10 11:17:55+0000 [-] TFTP Listener started at 127.0.0.1:69
2016-03-10 11:17:55+0000 [-] TFTP starting on 69
2016-03-10 11:17:55+0000 [-] Starting protocol <tftp.protocol.TFTP object at 0x7f94e7e38f98>
2016-03-10 11:17:55+0000 [-] TFTP Listener started at ::1:69
2016-03-10 11:17:55+0000 [Uninitialized] ClusterClient connection established (HOST:IPv4Address(TCP, '127.0.0.1', 51140) PEER:IPv4Address(TCP, '127.0.0.1', 5252))
2016-03-10 11:17:55+0000 [Uninitialized] ClusterClient connection established (HOST:IPv4Address(TCP, '127.0.0.1', 36800) PEER:IPv4Address(TCP, '127.0.0.1', 5251))
2016-03-10 11:17:55+0000 [Uninitialized] ClusterClient connection established (HOST:IPv4Address(TCP, '127.0.0.1', 38828) PEER:IPv4Address(TCP, '127.0.0.1', 5253))
2016-03-10 11:17:55+0000 [Uninitialized] ClusterClient connection established (HOST:IPv4Address(TCP, '10.0.3.1', 37464) PEER:IPv4Address(TCP, '10.0.3.1', 5250))
2016-03-10 11:17:55+0000 [ClusterClient,client] Event-loop 'maas-controller:pid=30606' authenticated.
2016-03-10 11:17:55+0000 [ClusterClient,client] Event-loop 'maas-controller:pid=30820' authenticated.
2016-03-10 11:17:55+0000 [ClusterClient,client] Event-loop 'maas-controller:pid=30764' authenticated.
2016-03-10 11:17:55+0000 [ClusterClient,client] Event-loop 'maas-controller:pid=30813' authenticated.
2016-03-10 11:17:55+0000 [ClusterClient,client] Rack controller '4y3h7n' registered (via maas-controller:pid=30606).
2016-03-10 11:17:56+0000 [ClusterClient,client] Rack controller REJECTED by the region (via maas-controller:pid=30813).
2016-03-10 11:17:56+0000 [ClusterClient,client] Rack controller REJECTED by the region (via maas-controller:pid=30820).

...

2016-03-10 11:18:38+0000 [HTTPPageGetter,client] Starting TFTP back-end failed.
 Traceback (most recent call last):
 Failure: twisted.web.error.Error: 500 INTERNAL SERVER ERROR

2016-03-10 11:18:38+0000 [HTTPPageGetter,client] Starting TFTP back-end failed.
 Traceback (most recent call last):
 Failure: twisted.web.error.Error: 500 INTERNAL SERVER ERROR

Revision history for this message
Michael Foord (mfoord) wrote :

I'm not convinced this is a duplicate of the linked bug. I'm running maas in a KVM instance and MAAS is providing DHCP for the subnet the nodes are on.

Revision history for this message
Michael Foord (mfoord) wrote :
Revision history for this message
Michael Foord (mfoord) wrote :
Revision history for this message
Michael Foord (mfoord) wrote :
Revision history for this message
Michael Foord (mfoord) wrote :
Revision history for this message
Michael Foord (mfoord) wrote :
Revision history for this message
Michael Foord (mfoord) wrote :

The BMC is virsh, but I'm not providing a power type to maas and am manually powering machines on. Which further makes me suspect it's not a duplicate of bug #1551376

Changed in maas:
importance: Undecided → Critical
Revision history for this message
Michael Foord (mfoord) wrote :
Revision history for this message
Michael Foord (mfoord) wrote :
Revision history for this message
Andres Rodriguez (andreserl) wrote :

2016-03-11 14:57:52 [maasserver] ERROR: ################################ Exception: 'NoneType' object has no attribute 'url' ################################
2016-03-11 14:57:52 [maasserver] ERROR: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/django/core/handlers/base.py", line 132, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib/python3/dist-packages/maasserver/utils/views.py", line 180, in view_atomic_with_post_commit_savepoint
    return view_atomic(*args, **kwargs)
  File "/usr/lib/python3.5/contextlib.py", line 30, in inner
    return func(*args, **kwds)
  File "/usr/lib/python3/dist-packages/maasserver/api/pxeconfig.py", line 202, in pxeconfig
    preseed_url = compose_preseed_url(node)
  File "/usr/lib/python3/dist-packages/maasserver/preseed.py", line 704, in compose_preseed_url
    base_url = node.get_boot_primary_rack_controller().url
AttributeError: 'NoneType' object has no attribute 'url'

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Please attach the output of:

maas <maas-user> machine read <machine-system-id>
and
maas <maas-user> interfaces read <node-system-id>

Revision history for this message
Andres Rodriguez (andreserl) wrote :

The only caveat in this bug, is that the power is set to 'manual'

summary: - Problem commissioning nodes (2.0)
+ Trying to commission a machine results into trying to enlist
summary: - Trying to commission a machine results into trying to enlist
+ Trying to commission a machine results into booting to the enlistment
+ environment
Changed in maas:
milestone: none → 2.0.0
Revision history for this message
Michael Foord (mfoord) wrote :

Still using manual power and KVM instances for maas controller and nodes.

With the current version of maas in experimental3 ppa I can't enlist. pxe doesn't happen (no bootable devices).

With the current version from next-proposed I can enlist, but on commissioning I get repeated warnings from url_helper.py (cloud init output):

Calling http://169.254.169.254/2009-04-04/meta-data/instance-id failed (No route to host)
Calling http://172.16.0.1/latest/meta-data/instance-id failed (404)

Revision history for this message
Michael Foord (mfoord) wrote :

Note that with next-proposed and the virsh powertype set on the node I *still* see the url error on attempting to commission.

Revision history for this message
Michael Foord (mfoord) wrote :
Download full text (3.8 KiB)

From the machine running the next-proposed version of maas, with a "commissioning" node:

$ maas maas machine read 4y3hae
Success.
Machine-readable output follows:
{
    "status_name": "Commissioning",
    "node_type_name": "Machine",
    "disable_ipv4": false,
    "system_id": "4y3hae",
    "netboot": true,
    "storage": 0.0,
    "power_state": "on",
    "power_type": "virsh",
    "owner": "maas-admin",
    "resource_uri": "/MAAS/api/2.0/machines/4y3hae/",
    "address_ttl": null,
    "zone": {
        "resource_uri": "/MAAS/api/2.0/zones/default/",
        "description": "",
        "name": "default"
    },
    "architecture": "amd64/generic",
    "virtualblockdevice_set": [],
    "physicalblockdevice_set": [],
    "hostname": "gallinaceous-leann",
    "status": 1,
    "status_message": "ubuntu/amd64/generic/trusty/release/boot-initrd",
    "osystem": "",
    "cpu_count": 0,
    "node_type": 0,
    "memory": 0,
    "ip_addresses": [],
    "domain": {
        "ttl": null,
        "authoritative": true,
        "resource_uri": "/MAAS/api/2.0/domains/0/",
        "id": 0,
        "resource_record_count": 0,
        "name": "maas"
    },
    "special_filesystems": [],
    "hwe_kernel": "",
    "tag_names": [],
    "min_hwe_kernel": "",
    "distro_series": "",
    "boot_disk": null,
    "fqdn": "gallinaceous-leann.maas",
    "macaddress_set": [
        {
            "mac_address": "52:54:00:ee:92:3e"
        }
    ],
    "interface_set": [
        {
            "tags": [],
            "discovered": null,
            "vlan": {
                "resource_uri": "/MAAS/api/2.0/vlans/1/",
                "vid": 0,
                "mtu": 1500,
                "dhcp_on": false,
                "secondary_rack": null,
                "id": 1,
                "fabric": "fabric-0",
                "primary_rack": null,
                "name": "untagged"
            },
            "id": 8,
            "type": "physical",
            "resource_uri": "/MAAS/api/2.0/nodes/4y3hae/interfaces/8/",
            "children": [],
            "mac_address": "52:54:00:ee:92:3e",
            "parents": [],
            "links": [],
            "effective_mtu": 1500,
            "params": "",
            "name": "eth0",
            "enabled": true
        }
    ],
    "swap_size": null,
    "boot_interface": {
        "tags": [],
        "discovered": null,
        "vlan": {
            "resource_uri": "/MAAS/api/2.0/vlans/1/",
            "vid": 0,
            "mtu": 1500,
            "dhcp_on": false,
            "secondary_rack": null,
            "id": 1,
            "fabric": "fabric-0",
            "primary_rack": null,
            "name": "untagged"
        },
        "id": 8,
        "type": "physical",
        "resource_uri": "/MAAS/api/2.0/nodes/4y3hae/interfaces/8/",
        "children": [],
        "mac_address": "52:54:00:ee:92:3e",
        "parents": [],
        "links": [],
        "effective_mtu": 1500,
        "params": "",
        "name": "eth0",
        "enabled": true
    },
    "blockdevice_set": [],
    "status_action": ""
}
maas-admin@maas-controller:~$ maas maas interfaces read 4y3hae
Success.
Machine-readable output follows:
[
    {
        "tags"...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.