[UI] MAAS should handle better errors in curtin preseeds

Bug #1548402 reported by Jeff Lane 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Wishlist
Andres Rodriguez

Bug Description

MAAS curtin preseed should be rendering the curtin_userdata file correctly, and not assuming that the user will use the correct encoding. From the user perspective, this should all be 'str' and not 'byte-encoded'.

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/django/core/handlers/base.py", line 132, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib/python3/dist-packages/maasserver/utils/views.py", line 180, in view_atomic_with_post_commit_savepoint
    return view_atomic(*args, **kwargs)
  File "/usr/lib/python3.5/contextlib.py", line 30, in inner
    return func(*args, **kwds)
  File "/usr/lib/python3/dist-packages/maasserver/api/support.py", line 54, in __call__
    response = upcall(request, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/django/views/decorators/vary.py", line 21, in inner_func
    response = func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/piston3/resource.py", line 190, in __call__
    result = self.error_handler(e, request, meth, em_format)
  File "/usr/lib/python3/dist-packages/piston3/resource.py", line 188, in __call__
    result = meth(request, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/maasserver/api/support.py", line 202, in dispatch
    return function(self, request, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/metadataserver/api.py", line 681, in read
    user_data = get_curtin_userdata(node)
  File "/usr/lib/python3/dist-packages/maasserver/preseed.py", line 269, in get_curtin_userdata
    configs=get_curtin_yaml_config(node),
  File "/usr/lib/python3/dist-packages/maasserver/preseed.py", line 216, in get_curtin_yaml_config
    main_config = get_curtin_config(node)
  File "/usr/lib/python3/dist-packages/maasserver/preseed.py", line 342, in get_curtin_config
    return template.substitute(**context)
  File "/usr/lib/python3/dist-packages/tempita/__init__.py", line 173, in substitute
    result, defs, inherit = self._interpret(ns)
  File "/usr/lib/python3/dist-packages/tempita/__init__.py", line 184, in _interpret
    self._interpret_codes(self._parsed, ns, out=parts, defs=defs)
  File "/usr/lib/python3/dist-packages/tempita/__init__.py", line 212, in _interpret_codes
    self._interpret_code(item, ns, out, defs)
  File "/usr/lib/python3/dist-packages/tempita/__init__.py", line 218, in _interpret_code
    self._exec(code[2], ns, pos)
  File "/usr/lib/python3/dist-packages/tempita/__init__.py", line 320, in _exec
    raise exc_info[0](e).with_traceback(exc_info[2])
  File "/usr/lib/python3/dist-packages/tempita/__init__.py", line 312, in _exec
    exec(code, self.default_namespace, ns)
  File "<string>", line 7, in <module>
TypeError: a bytes-like object is required, not 'str' at line 2 column 3 in file /etc/maas/preseeds/curtin_userdata

=========================================================================
I have a MAAS server running Xenial and MAAS 1.10 from maas/next. The server has 14.04 images from the Release stream.

On the node, there are three NICs and all three are set to AutoAssign and are assigned 10.0.0.125, 126 and 127.

On boot, the installer ephemeral boots but then cloud-init tries to get data from 169.254.169.254. It does this for 120 seconds, THEN tries calling 10.0.0.1 which also fails, and then it finally just gives up.

At this point, the node is up, and stuck and I had to backdoor it to get logs.

The logs are attached from the node, the biggest thing I noticed was this:

2016-02-22 16:50:25,132 - util.py[WARNING]: Failed fetching metadata from url http://10.0.0.1/MAAS/metadata/curtin
2016-02-22 16:50:25,133 - util.py[DEBUG]: Failed fetching metadata from url http://10.0.0.1/MAAS/metadata/curtin
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/cloudinit/sources/DataSourceMAAS.py", line 84, in get_data
    paths=self.paths)
  File "/usr/lib/python2.7/dist-packages/cloudinit/sources/DataSourceMAAS.py", line 236, in read_maas_seed_url
    ssl_details=ssl_details)
  File "/usr/lib/python2.7/dist-packages/cloudinit/util.py", line 704, in read_file_or_url
    exception_cb=exception_cb)
  File "/usr/lib/python2.7/dist-packages/cloudinit/url_helper.py", line 257, in readurl
    raise excps[-1]
UrlError: 500 Server Error: INTERNAL SERVER ERROR

However, when I try that metadata, after getting in via the backdoor, it seems to work:

backdoor@ubuntu:~$ echo $url
http://10.0.0.1/MAAS/metadata/curtin
backdoor@ubuntu:~$ sudo python $maasds --config=$cfg get $url
== http://10.0.0.1/MAAS/metadata/curtin ==
2012-03-01
latest

backdoor@ubuntu:~$ sudo python $maasds --config=$cfg crawl $url/latest/meta-data/
== http://10.0.0.1/MAAS/metadata/curtin/latest/meta-data/instance-id ==
node-9c831f44-d6b9-11e5-8cc4-eca86bfb9f66

== http://10.0.0.1/MAAS/metadata/curtin/latest/meta-data/local-hostname ==
x-wing.maas

== http://10.0.0.1/MAAS/metadata/curtin/latest/meta-data/public-keys ==
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD0UpxLhAfnGngLGdgK720nezbgSKgYfV8WyNA4f2X6ATtMEt28Kx7UdO4udhBeINbHlfNrt9ddm+VrKC4MhPGeKLna71IKu07UUbZVWz0kHd8+gwpeoc7VA8p02ZSbXcXBSLhDBEWa8ly2opmuFvs6jG2UdusAUe2ooskTL+itRE68QBD2um90MgbmM5efYayGX97c++0ogxM21osSjpapiJrXap2zUDolq0IDVQA0YujyGw85BgojmuaSvLDvnynic610Ogkrd00TSAniZ8h18C5xZur8Ex1yfa/p4h87AH/Equ0fblbSxJkhP59HUnYiaUvGJ3JXb4YQcEX2UJMb bladernr@critical-maas

== http://10.0.0.1/MAAS/metadata/curtin/latest/meta-data/x509 ==

backdoor@ubuntu:~$ sudo python $maasds --config=$cfg crawl $url/2012-03-01/meta-data/
== http://10.0.0.1/MAAS/metadata/curtin/2012-03-01/meta-data/instance-id ==
node-9c831f44-d6b9-11e5-8cc4-eca86bfb9f66

== http://10.0.0.1/MAAS/metadata/curtin/2012-03-01/meta-data/local-hostname ==
x-wing.maas

== http://10.0.0.1/MAAS/metadata/curtin/2012-03-01/meta-data/public-keys ==
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD0UpxLhAfnGngLGdgK720nezbgSKgYfV8WyNA4f2X6ATtMEt28Kx7UdO4udhBeINbHlfNrt9ddm+VrKC4MhPGeKLna71IKu07UUbZVWz0kHd8+gwpeoc7VA8p02ZSbXcXBSLhDBEWa8ly2opmuFvs6jG2UdusAUe2ooskTL+itRE68QBD2um90MgbmM5efYayGX97c++0ogxM21osSjpapiJrXap2zUDolq0IDVQA0YujyGw85BgojmuaSvLDvnynic610Ogkrd00TSAniZ8h18C5xZur8Ex1yfa/p4h87AH/Equ0fblbSxJkhP59HUnYiaUvGJ3JXb4YQcEX2UJMb bladernr@critical-maas

== http://10.0.0.1/MAAS/metadata/curtin/2012-03-01/meta-data/x509 ==

Related branches

Revision history for this message
Jeff Lane  (bladernr) wrote :

maas:
  Installed: 1.10.0+bzr4578-0ubuntu2
  Candidate: 1.10.0+bzr4578-0ubuntu2
  Version table:
 *** 1.10.0+bzr4578-0ubuntu2 500
        500 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 Packages
        500 http://us.archive.ubuntu.com/ubuntu xenial/main i386 Packages
        100 /var/lib/dpkg/status
     1.10.0+bzr4578-0ubuntu2 500
        500 http://ppa.launchpad.net/maas/next/ubuntu xenial/main amd64 Packages
        500 http://ppa.launchpad.net/maas/next/ubuntu xenial/main i386 Packages

Revision history for this message
Andres Rodriguez (andreserl) wrote :

We are currently looking into the issue but we have been unable to reproduce so far.

Changed in maas:
status: New → Incomplete
Revision history for this message
Jeff Lane  (bladernr) wrote :

So First, thought it may be because the node was booting from a shared AMT/Data port. I went in and disabled that data port so that now only the two onboard intel NICs are used for PXE.

I tried again, and the same failure occurs. IT tries 10.0.0.1 and gets a 500 Internal Server Error, then falls back to the link local, that fails and it retries 10.0.0.1 again.

It could be a networking issue, but then again, I am able to successfully ssh to the node to get the logs... attached is the logs from the node on 2nd attempt

Revision history for this message
Jeff Lane  (bladernr) wrote :

Node cloud-init logs from failure AFTER disabling the shared port.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Also a tarball of /var/log/maas and /var/log/apache2

Changed in maas:
status: Incomplete → New
summary: - unable to deploy nodes w/ 14.04 and MAAS 1.10 on Xenial
+ TypeError: a bytes-like object is required, not 'str' at line 2 column 3
+ in file /etc/maas/preseeds/curtin_userdata
description: updated
Revision history for this message
Jeff Lane  (bladernr) wrote :

Andres says the issue is in the curtin_userdata file... attaching that.

Should be noted that this curtin file works fine on 1.9/Trusty MAAS systems.

The default curtin_userdata works fine.

Revision history for this message
Jeff Lane  (bladernr) wrote : Re: TypeError: a bytes-like object is required, not 'str' at line 2 column 3 in file /etc/maas/preseeds/curtin_userdata

fwiw, for our curtin stuff, this is an easy fix by decoding the output of check_output in the py section of our curtin_userdata file.

cache_output = check_output(['apt-cache', 'policy', 'maas']).decode('utf-8')

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Oh I see now. You are running your own python script inside curtin, but curtin preseed rendering is failing because the script in itself is not Python3 compatible. This bug then has become invalid per se, howeve,r MAAS should have better error handling!

Changed in maas:
importance: Undecided → Critical
milestone: none → 2.0.0
summary: - TypeError: a bytes-like object is required, not 'str' at line 2 column 3
- in file /etc/maas/preseeds/curtin_userdata
+ MAAS should handle better errors in curtin preseeds
Gavin Panella (allenap)
Changed in maas:
status: New → Triaged
no longer affects: maas/1.10
summary: - MAAS should handle better errors in curtin preseeds
+ [UI] MAAS should handle better errors in curtin preseeds
Changed in maas:
assignee: nobody → Newell Jensen (newell-jensen)
Changed in maas:
importance: Critical → Wishlist
Changed in maas:
milestone: 2.0.0 → 2.1.0
Changed in maas:
milestone: 2.1.0 → 2.1.1
Changed in maas:
milestone: 2.1.1 → 2.1.2
Changed in maas:
milestone: 2.1.2 → 2.1.3
Changed in maas:
milestone: 2.1.3 → 2.2.0
Changed in maas:
milestone: 2.2.0 → 2.2.x
Changed in maas:
milestone: 2.2.x → 2.3.0
assignee: Newell Jensen (newell-jensen) → nobody
Changed in maas:
assignee: nobody → Andres Rodriguez (andreserl)
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: 2.3.0 → 2.3.0alpha1
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.