Comment 1 for bug 1711414

Revision history for this message
Andres Rodriguez (andreserl) wrote :

With the attached branch (https://code.launchpad.net/~andreserl/maas/+git/maas/+merge/329275 ) MAAS should be able to correctly stop the rack controller that's running on a snap. What this does is effectively tell the MAAS snap to change the mode from "Rack Controller" to "None". Doing so, it stops all services. However

In the UI I see the following error:

Node failed to be deleted, because of the following error:

In the logs I see:

2017-08-18 22:02:17 maasserver.websockets.protocol: [critical] Error on request (632) controller.action:

Traceback (most recent call last):
Failure: provisioningserver.rpc.exceptions.CannotDisableAndShutoffRackd:
2017-08-18 22:02:27 maasserver.rpc.regionservice: [info] Rack controller 'xh33gw' disconnected.
2017-08-18 22:02:27 RegionServer,6,::ffff:192.168.122.42: [info] RegionServer connection lost (HOST:IPv6Address(TCP, '::ffff:192.168.122.2', 5250) PEER:IPv6Address(TCP, '::ffff:192.168.122.42', 55580))

In the snap logs I see:

==> /var/snap/maas/common/log/supervisor.log <==
2017-08-18 22:02:16,955 WARN received SIGHUP indicating restart request
2017-08-18 22:02:16,958 INFO waiting for tgt, rackd to die

==> /var/snap/maas/common/log/rackd.log <==
2017-08-18 22:02:16 ClusterClient,client: [info] Received SIGTERM, shutting down.
2017-08-18 22:02:17 provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at b'http://[::ffff:127.0.0.1]:5240/MAAS/rpc/').

[...]

==> /var/snap/maas/common/log/supervisor.log <==
2017-08-18 22:02:20,781 INFO waiting for tgt, rackd to die
2017-08-18 22:02:23,788 INFO waiting for tgt, rackd to die
2017-08-18 22:02:26,794 INFO waiting for tgt, rackd to die
2017-08-18 22:02:27,797 WARN killing 'rackd' (5537) with SIGKILL
2017-08-18 22:02:28,813 INFO stopped: rackd (terminated by SIGKILL)
2017-08-18 22:02:29,818 INFO waiting for tgt to die
2017-08-18 22:02:32,825 INFO waiting for tgt to die
2017-08-18 22:02:35,832 INFO waiting for tgt to die
2017-08-18 22:02:38,839 INFO waiting for tgt to die
2017-08-18 22:02:41,846 WARN killing 'tgt' (5728) with SIGKILL
2017-08-18 22:02:41,849 INFO stopped: tgt (terminated by SIGKILL)
2017-08-18 22:02:41,871 CRIT Supervisor running as root (no user in config file)
2017-08-18 22:02:41,874 INFO RPC interface 'supervisor' initialized
2017-08-18 22:02:41,877 INFO supervisord started with pid 5502

As you can see, this correctly stopped the snap and the rack controller is no longer running, however, the action failed above, which is what causes some of the extra logging. That said, if I click on "retry" in the UI, it now successfully removes the rack controller.