update of maas-cluster-controller on trusty dumps traceback and crashes

Bug #1302772 reported by Jeff Lane 
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Jeroen T. Vermeulen
1.5
Fix Released
Critical
Jeroen T. Vermeulen
maas (Ubuntu)
Fix Released
Critical
Unassigned

Bug Description

Updated my trusty MAAS server today using apt-get dist-upgrade.

In the process, it updated maas-cluster-controller. After asking me if I wanted to replace or keep certain config files, it then dumped a traceback and just froze the console.

maas-dhcp-server start/running, process 10590
Setting up python-maas-provisioningserver (1.5+bzr2204-0ubuntu1) ...
Setting up maas-cluster-controller (1.5+bzr2204-0ubuntu1) ...
Installing new version of config file /etc/maas/templates/dhcp/dhcpd.conf.template ...
Installing new version of config file /etc/maas/templates/power/sm15k.template ...

Configuration file '/etc/maas/templates/power/ipmi.template'
 ==> Modified (by you or by a script) since installation.
 ==> Package distributor has shipped an updated version.
   What would you like to do about it ? Your options are:
    Y or I : install the package maintainer's version
    N or O : keep your currently-installed version
      D : show the differences between the versions
      Z : start a shell to examine the situation
 The default action is to keep your current version.
*** ipmi.template (Y/I/N/O/D/Z) [default=N] ?
Installing new version of config file /etc/maas/templates/pxe/config.install.armhf.template ...
Installing new version of config file /etc/maas/maas-cluster-http.conf ...

Configuration file '/etc/maas/pserv.yaml'
 ==> Modified (by you or by a script) since installation.
 ==> Package distributor has shipped an updated version.
   What would you like to do about it ? Your options are:
    Y or I : install the package maintainer's version
    N or O : keep your currently-installed version
      D : show the differences between the versions
      Z : start a shell to examine the situation
 The default action is to keep your current version.
*** pserv.yaml (Y/I/N/O/D/Z) [default=N] ?
 * Restarting web server apache2 AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
                                                             [ OK ]
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/dist-packages/provisioningserver/__main__.py", line 43, in <module>
    main()
  File "/usr/lib/python2.7/dist-packages/provisioningserver/utils/__init__.py", line 592, in __call__
    self.execute(argv)
  File "/usr/lib/python2.7/dist-packages/provisioningserver/utils/__init__.py", line 587, in execute
    args.handler.run(args)
  File "/usr/lib/python2.7/dist-packages/provisioningserver/upgrade_cluster.py", line 210, in run
    hook()
  File "/usr/lib/python2.7/dist-packages/provisioningserver/upgrade_cluster.py", line 185, in generate_boot_resources_config
    rewrite_boot_resources_config(config_file)
  File "/usr/lib/python2.7/dist-packages/provisioningserver/upgrade_cluster.py", line 168, in rewrite_boot_resources_config
    tftproot = Config.load_from_cache()['tftp']['root']
  File "/usr/lib/python2.7/dist-packages/provisioningserver/config.py", line 240, in load_from_cache
    cls._cache[filename] = cls.parse(stream)
  File "/usr/lib/python2.7/dist-packages/provisioningserver/config.py", line 183, in parse
    return cls.to_python(yaml.safe_load(stream))
  File "/usr/lib/python2.7/dist-packages/formencode/api.py", line 439, in to_python
    value = tp(value, state)
  File "/usr/lib/python2.7/dist-packages/formencode/schema.py", line 161, in _to_python
    value_dict, state)
formencode.api.Invalid: The input field 'boot' was not expected.

This is ps showing the defunct process:
 7573 ? S 0:01 \_ sshd: ubuntu@pts/7
 7574 pts/7 Ss 0:00 \_ -bash
10383 pts/7 S+ 0:00 \_ sudo apt-get dist-upgrade
10384 pts/7 S+ 1:21 \_ apt-get dist-upgrade
24035 pts/13 Ss+ 0:01 \_ /usr/bin/dpkg --status-fd 63 --configure libitm1:amd64 libgomp1:amd64 libgo4:amd64 l
10639 pts/13 S+ 0:00 \_ /usr/bin/perl -w /usr/share/debconf/frontend /var/lib/dpkg/info/maas-cluster-con
10647 pts/13 Z+ 0:00 \_ [maas-cluster-co] <defunct>

Related branches

Revision history for this message
Jeff Lane  (bladernr) wrote :
Download full text (4.6 KiB)

I killed all that and then retried like so, and got the same failure:
ubuntu@critical-maas:~$ sudo apt-get dist-upgrade
[sudo] password for ubuntu:
E: dpkg was interrupted, you must manually run 'sudo dpkg --configure -a' to correct the problem.
ubuntu@critical-maas:~$ sudo dpkg --configure -a
Setting up uvtool (0~bzr92-0ubuntu1) ...
Setting up python-sip (4.15.5-1build1) ...
Setting up amtterm (1.3-1ubuntu1) ...
Processing triggers for initramfs-tools (0.103ubuntu4) ...
update-initramfs: Generating /boot/initrd.img-3.13.0-22-generic
Setting up python-iscpy (1.05-0ubuntu2) ...
Setting up pulseaudio-utils (1:4.0-0ubuntu11) ...
Setting up xserver-xorg-video-sis (1:0.10.7-0ubuntu6) ...
Setting up python-pil (2.3.0-1ubuntu3) ...
Setting up usb-creator-common (0.2.56) ...
Setting up python-qt4-dbus (4.10.4+dfsg-1build1) ...
Setting up ssh-askpass-gnome (1:6.6p1-2) ...
Setting up ubuntu-artwork (1:14.04+14.04.20140402-0ubuntu1) ...
Setting up indicator-sound (12.10.2+14.04.20140401-0ubuntu1) ...
Installing new version of config file /etc/xdg/autostart/indicator-sound.desktop ...
Setting up cloud-image-utils (0.27-0ubuntu8) ...
Setting up python-smbc (1.0.14.1-0ubuntu2) ...
Setting up python3-lxml (3.3.3-1) ...
Setting up unattended-upgrades (0.82.1ubuntu2) ...
Setting up python3-markupsafe (0.18-1build2) ...
Setting up python-cups (1.9.66-0ubuntu2) ...
Processing triggers for ureadahead (0.100.0-16) ...
Setting up curtin-common (0.1.0~bzr124-0ubuntu1) ...
Setting up python3-cairo (1.10.0+dfsg-3ubuntu2) ...
Setting up juju-core (1.17.7-0ubuntu1) ...
update-alternatives: using /usr/lib/juju-1.17.7/bin/juju to provide /usr/bin/juju (juju) in auto mode
Setting up python3-simplestreams (0.1.0~bzr341-0ubuntu1) ...
Setting up python-pam (0.4.2-13.1ubuntu3) ...
Setting up unity-scope-texdoc (0.1+14.04.20140328-0ubuntu1) ...
Setting up lightdm (1.9.14-0ubuntu2) ...
Installing new version of config file /etc/apparmor.d/abstractions/lightdm ...
Setting up python-qt4 (4.10.4+dfsg-1build1) ...
Setting up python-markupsafe (0.18-1build2) ...
Setting up shotwell-common (0.18.0-0ubuntu4) ...
Setting up python-imaging (2.3.0-1ubuntu3) ...
Processing triggers for libc-bin (2.19-0ubuntu3) ...
Setting up python3-magic (1:5.14-2ubuntu3) ...
Setting up python3-software-properties (0.92.35) ...
Setting up unity-scope-calculator (0.1+14.04.20140328-0ubuntu1) ...
Setting up mobile-broadband-provider-info (20140317-1) ...
Setting up uvtool-libvirt (0~bzr92-0ubuntu1) ...
Pool uvtool marked as autostarted
Setting up onboard (1.0.0-0ubuntu3) ...
Setting up python-commandnotfound (0.3ubuntu12) ...
Setting up python3-yaml (3.10-4build4) ...
Setting up maas-cluster-controller (1.5+bzr2204-0ubuntu1) ...
 * Restarting web server apache2 AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
                                                                 [ OK ]
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/pytho...

Read more...

Revision history for this message
Julian Edwards (julian-edwards) wrote :

When the upgrade got to here:

Configuration file '/etc/maas/pserv.yaml'
 ==> Modified (by you or by a script) since installation.
 ==> Package distributor has shipped an updated version.
   What would you like to do about it ? Your options are:
    Y or I : install the package maintainer's version
    N or O : keep your currently-installed version
      D : show the differences between the versions
      Z : start a shell to examine the situation
 The default action is to keep your current version.
*** pserv.yaml (Y/I/N/O/D/Z) [default=N] ?

Did you respond with an N (or equivalent) ?

Changed in maas:
status: New → Incomplete
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

You should be able to work around this by editing /etc/maas/pserv.yaml and commenting out, or removing, the entire section called “boot” before the upgrade.

We should have kept the “boot” section of the schema (at least as it appeared in Saucy, before it grew fancy) just for compatibility. I think I first added those back for compatibility when somebody else had removed them, and then later stupidly removed them again myself.

Changed in maas:
status: Incomplete → Triaged
Changed in maas:
importance: Undecided → Critical
Changed in maas:
assignee: nobody → Jeroen T. Vermeulen (jtv)
status: Triaged → In Progress
Revision history for this message
Jeff Lane  (bladernr) wrote :

Well, I commented out the boot section (All I had to do was comment out the header "boot:" as everything else was already commented).

Ran dpkg --reconfigure -a and ended up choking on "ephemeral:"

Setting up libgo4:amd64 (4.8.2-19ubuntu1) ...
Setting up apt-utils (0.9.15.4ubuntu5) ...
Setting up python-ubuntu-sso-client (13.10-0ubuntu6) ...
Setting up libgomp1:amd64 (4.8.2-19ubuntu1) ...
Setting up maas-cluster-controller (1.5+bzr2204-0ubuntu1) ...
 * Restarting web server apache2 AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
                                                                      [ OK ]
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/dist-packages/provisioningserver/__main__.py", line 43, in <module>
    main()
  File "/usr/lib/python2.7/dist-packages/provisioningserver/utils/__init__.py", line 592, in __call__
    self.execute(argv)
  File "/usr/lib/python2.7/dist-packages/provisioningserver/utils/__init__.py", line 587, in execute
    args.handler.run(args)
  File "/usr/lib/python2.7/dist-packages/provisioningserver/upgrade_cluster.py", line 210, in run
    hook()
  File "/usr/lib/python2.7/dist-packages/provisioningserver/upgrade_cluster.py", line 185, in generate_boot_resources_config
    rewrite_boot_resources_config(config_file)
  File "/usr/lib/python2.7/dist-packages/provisioningserver/upgrade_cluster.py", line 168, in rewrite_boot_resources_config
    tftproot = Config.load_from_cache()['tftp']['root']
  File "/usr/lib/python2.7/dist-packages/provisioningserver/config.py", line 240, in load_from_cache
    cls._cache[filename] = cls.parse(stream)
  File "/usr/lib/python2.7/dist-packages/provisioningserver/config.py", line 183, in parse
    return cls.to_python(yaml.safe_load(stream))
  File "/usr/lib/python2.7/dist-packages/formencode/api.py", line 439, in to_python
    value = tp(value, state)
  File "/usr/lib/python2.7/dist-packages/formencode/schema.py", line 223, in _to_python
    value_dict, state, error_dict=errors)
formencode.api.Invalid: tftp: The input field 'ephemeral' was not expected.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Sounds as if there was a piece of the boot section that wasn't commented out after all... The “ephemeral” key belongs in the boot section.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

A fix for this incompatibility has landed in both trunk and 1.5. Andres will see it into the package.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Also, to answer Julian in comment #2, Correct, I chose N when asked about overwriting pserv.yaml. I never allow automatic overwriting of config files, and I would well imagine that most people do likewise as that tends to break things when custom configs disappear automatically during upgrades...

(been bitten by that more than once, so I just never do it anymore).

Revision history for this message
Ryan Beisner (1chb1n) wrote :

FYI - in upgrading a previously happy Trusty MAAS test enviro, we can reproduce this issue.

$ sudo apt-get update && sudo apt-get upgrade
...
Setting up maas-cluster-controller (1.5+bzr2227-0ubuntu1) ...
Installing new version of config file /etc/maas/templates/power/sm15k.template ...
 * Restarting web server apache2
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.17.17.200. Set the 'ServerName' directive globally to suppress this message
   ...done.
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/dist-packages/provisioningserver/__main__.py", line 42, in <module>
    main()
  File "/usr/lib/python2.7/dist-packages/provisioningserver/utils/__init__.py", line 592, in __call__
    self.execute(argv)
  File "/usr/lib/python2.7/dist-packages/provisioningserver/utils/__init__.py", line 587, in execute
    args.handler.run(args)
  File "/usr/lib/python2.7/dist-packages/provisioningserver/upgrade_cluster.py", line 210, in run
    hook()
  File "/usr/lib/python2.7/dist-packages/provisioningserver/upgrade_cluster.py", line 185, in generate_boot_resources_config
    rewrite_boot_resources_config(config_file)
  File "/usr/lib/python2.7/dist-packages/provisioningserver/upgrade_cluster.py", line 168, in rewrite_boot_resources_config
    tftproot = Config.load_from_cache()['tftp']['root']
  File "/usr/lib/python2.7/dist-packages/provisioningserver/config.py", line 240, in load_from_cache
    cls._cache[filename] = cls.parse(stream)
  File "/usr/lib/python2.7/dist-packages/provisioningserver/config.py", line 183, in parse
    return cls.to_python(yaml.safe_load(stream))
  File "/usr/lib/python2.7/dist-packages/formencode/api.py", line 439, in to_python
    value = tp(value, state)
  File "/usr/lib/python2.7/dist-packages/formencode/schema.py", line 161, in _to_python
    value_dict, state)
formencode.api.Invalid: The input field 'boot' was not expected.

* Process never completes, user has to break out.

Changed in maas (Ubuntu):
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Ryan Beisner (1chb1n) wrote :
Download full text (3.9 KiB)

After failed apt-get upgrade of maas packages, the formerly-happy Trusty MAAS deployment is non-functional. Front end gives 'Internal server error.'

##### txlongpol.log #####
2014-04-07 16:42:45-0500 [AMQClientWithCallback,client] Stopping factory <txlongpoll.client.AMQFactory instance at 0x7f4f4d11f7e8>
2014-04-07 16:42:45-0500 [AMQClientWithCallback,client] Logged OOPS id OOPS-09ff5a9629126f39d99d364b93ae9cd2: No exception type: No exception value
2014-04-07 16:42:45-0500 [AMQClientWithCallback,client] Logged OOPS id OOPS-bffd13f4baf74b1e4bdbdc33592e44ad: Closed: [Failure instance: Traceback (failure with no frames): <
class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.

##### pserv.log #####
2014-04-02 20:49:18-0500 [Uninitialized] ClusterClient connection established (HOST:IPv4Address(TCP, '127.0.0.1', 59611) PEER:IPv4Address(TCP, u'127.0.0.1', 38365))
2014-04-02 20:49:18-0500 [ClusterClient,client] ClusterClient connection lost (HOST:IPv4Address(TCP, '127.0.0.1', 35532) PEER:IPv4Address(TCP, u'127.0.0.1', 49852))
2014-04-02 20:49:18-0500 [ClusterClient,client] ClusterClient connection lost (HOST:IPv4Address(TCP, '127.0.0.1', 35532) PEER:IPv4Address(TCP, u'127.0.0.1', 49852))
2014-04-02 20:49:18-0500 [ClusterClient,client] Stopping factory <twisted.internet.endpoints.OneShotFactory instance at 0x7ffcf17c1680>
2014-04-02 20:49:18-0500 [ClusterClient,client] Stopping factory <twisted.internet.endpoints.OneShotFactory instance at 0x7ffcf17c1680>
2014-04-02 20:50:04-0500 [-] Received SIGTERM, shutting down.
2014-04-02 20:50:04-0500 [-] Received SIGTERM, shutting down.

##### maas.log #####
ERROR 2014-04-07 16:42:02,716 django.request Internal Server Error: /MAAS/
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/django/core/handlers/base.py", line 114, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib/python2.7/dist-packages/django/views/generic/base.py", line 69, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/django/views/generic/base.py", line 87, in dispatch
    return handler(request, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/views/nodes.py", line 206, in get
    return super(NodeListView, self).get(request, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/django/views/generic/list.py", line 152, in get
    context = self.get_context_data()
  File "/usr/lib/python2.7/dist-packages/maasserver/views/nodes.py", line 333, in get_context_data
    context.update(get_longpoll_context())
  File "/usr/lib/python2.7/dist-packages/maasserver/views/nodes.py", line 96, in get_longpoll_context
    'longpoll_queue': messaging.getQueue().name,
  File "/usr/lib/python2.7/dist-packages/maasserver/rabbit.py", line 77, in getQueue
    return RabbitQueue(self._session, self.exchange_name)
  File "/usr/lib/python2.7/dist-packages/maasserver/rabbit.py", line 109, in __init__
    self.queue_name = self.channel.queue_declare(
  File "/usr/lib/python2.7/dist-packages/maasserver/rabbit.py", line 90, in channel
    self._channel = self....

Read more...

Changed in maas:
milestone: none → 14.10
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

The latest build in ppa:maas-maintainers/dailybuilds should have the fix. Could you try again, but with that package?

Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 1302772] Re: update of maas-cluster-controller on trusty dumps traceback and crashes

On Monday 07 Apr 2014 16:03:39 you wrote:
> Also, to answer Julian in comment #2, Correct, I chose N when asked
> about overwriting pserv.yaml. I never allow automatic overwriting of
> config files, and I would well imagine that most people do likewise as
> that tends to break things when custom configs disappear automatically
> during upgrades...
>
> (been bitten by that more than once, so I just never do it anymore).

I do hope you merge back changes from the upstream config? MAAS makes a lot
of tweaks to them from time to time.

Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: 14.10 → none
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package maas - 1.5+bzr2230-0ubuntu1

---------------
maas (1.5+bzr2230-0ubuntu1) trusty; urgency=medium

  * New upstream bugfix release:
    - Fix Cluster Controller to handle deprecated config items gracefull.
      Otherwise it fails on upgrades. (LP: #1302772)
    - Fix documentation generation and referencing. (LP: #1302956)
    - Ensure we PXE boot when we turn on SM15K systems. (LP: #1303915)
 -- Andres Rodriguez <email address hidden> Mon, 07 Apr 2014 10:26:51 -0400

Changed in maas (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Confirmed to resolve the MAAS upgrade crash on an existing Trusty MAAS environment.

Beware however, after rev 2230 was confirmed in the repo, I was still stuck in a loop with apt/dpkg trying to install rev 2227. I couldn't complete an apt-get update (it halted, advising that dpkg --configure -a must first be run). But dpkg --configure -a tried to install rev 2227, bringing me back to the initial crash symptom.

This broke that cycle, and upgraded to 2230:
$ sudo dpkg --clear-selections
$ sudo apt-get update -y && sudo apt-get upgrade

Thank you all!

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Thanks for confirming the fix Ryan.

Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.