[2.4] INTERNAL_SERVER_ERROR on various operations without crash logs, which seemingly continue to work (e.g. release, allocate)

Bug #1761530 reported by Andres Rodriguez on 2018-04-05
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
High
Blake Rouse

Bug Description

There's two situations where logs are being suppressed:

# As per Jasons bug.

In maas 2.4, we're seeing lots of 500 responses to allocate requests. In those requests, maas is setting the node to ALLOCATED but is returning a 500 code.

Here's two examples from the same machine and same run:

http://paste.ubuntu.com/p/PDnYvnhmrP/

We've seen this on:
maas_2.4.0~beta3-6863-ge11fd5e-0ubuntu1~18.04.1~20180418~ubuntu18.04.1

and on:
2.4.0~beta2-6865-gec43e47e6-0ubuntu1

We're also seeing 500 responses on "release" operations and metadata posts in the same runs.

We believe these 500 errors are causing issues with juju not being able to recognize which machine's it's successfully allocated.

# As per CI.

======================================================================
ERROR: maas-integration.TestMAASIntegration.test_region_rack_connected
----------------------------------------------------------------------
testtools.testresult.real._StringException: Empty attachments:
  stderr for maas maas ['rack-controllers', 'read']

retcode for maas maas ['rack-controllers', 'read']: {{{2}}}
stdout for maas maas ['rack-controllers', 'read']: {{{
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <title>Error: Internal server error</title>
  </head>
  <body>
    <h2>
      Internal server error.
    </h2>
  </body>
</html>
}}}

Traceback (most recent call last):
  File "/tmp/autopkgtest.W1TiDa/build.0iu/real-tree/debian/tests/timeout.py", line 22, in wrapper
    result = func(*args, **kwargs)
  File "/tmp/autopkgtest.W1TiDa/build.0iu/real-tree/debian/tests/common.py", line 682, in test_region_rack_connected
    rack_id = self.get_master_rack()['system_id']
  File "/tmp/autopkgtest.W1TiDa/build.0iu/real-tree/debian/tests/common.py", line 663, in get_master_rack
    rack_controllers = loads(output)
  File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Related branches

Changed in maas:
milestone: none → 2.4.0beta2
importance: Undecided → Critical
status: New → Triaged
Changed in maas:
milestone: 2.4.0beta2 → 2.4.0beta3
importance: Critical → High
Andres Rodriguez (andreserl) wrote :

Ok, I can confirm this is the case. For example, we see a lot of INTERNAL_SERVER_ERROR on the regiond.log but we dont see tracebacks:

10.244.40.30/var/log/maas/regiond.log:2018-04-19 04:57:21 regiond: [info] 10.244.40.32 POST /MAAS/api/2.0/machines/?op=allocate HTTP/1.1 --> 500 INTERNAL_SERVER_ERROR (referrer: -; agent: Go-http-client/1.1)
10.244.40.30/var/log/maas/regiond.log:2018-04-19 04:57:32 regiond: [info] 10.244.40.32 POST /MAAS/api/2.0/machines/?op=allocate HTTP/1.1 --> 500 INTERNAL_SERVER_ERROR (referrer: -; agent: Go-http-client/1.1)
10.244.40.30/var/log/maas/regiond.log:2018-04-19 04:58:52 regiond: [info] 10.244.40.32 POST /MAAS/api/2.0/machines/?op=release HTTP/1.1 --> 500 INTERNAL_SERVER_ERROR (referrer: -; agent: Go-http-client/1.1)
10.244.40.30/var/log/maas/regiond.log:2018-04-19 04:59:06 regiond: [info] 10.244.40.32 POST /MAAS/api/2.0/machines/?op=allocate HTTP/1.1 --> 500 INTERNAL_SERVER_ERROR (referrer: -; agent: Go-http-client/1.1)
10.244.40.31/var/log/maas/regiond.log:2018-04-19 04:16:29 regiond: [info] 10.244.40.32 POST /MAAS/metadata/2012-03-01/ HTTP/1.1 --> 500 INTERNAL_SERVER_ERROR (referrer: -; agent: Python-urllib/3.5)
10.244.40.31/var/log/maas/regiond.log:2018-04-19 04:26:26 regiond: [info] 10.244.40.32 POST /MAAS/metadata/2012-03-01/ HTTP/1.1 --> 500 INTERNAL_SERVER_ERROR (referrer: -; agent: Python-urllib/3.5)

Changed in maas:
milestone: 2.4.0beta3 → 2.4.0rc1
summary: - [2.4] Internal Server error has no logs in /var/log/maas/*.log
+ [2.4] INTERNAL_SERVER_ERROR has no logs in /var/log/maas/*.log

Hmm, just manually adding an error causes a traceback in the logs.

2018-04-19 14:53:16 regiond: [info] 127.0.0.1 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService)
2018-04-19 14:53:27 maasserver: [error] ################################ Exception: ################################
2018-04-19 14:53:27 maasserver: [error] Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/django/core/handlers/base.py", line 185, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib/python3/dist-packages/maasserver/utils/views.py", line 196, in view_atomic_with_post_commit_savepoint
    return view_atomic(*args, **kwargs)
  File "/usr/lib/python3.6/contextlib.py", line 52, in inner
    return func(*args, **kwds)
  File "/usr/lib/python3/dist-packages/maasserver/api/support.py", line 70, in __call__
    response = upcall(request, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/django/views/decorators/vary.py", line 21, in inner_func
    response = func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/piston3/resource.py", line 190, in __call__
    result = self.error_handler(e, request, meth, em_format)
  File "/usr/lib/python3/dist-packages/piston3/resource.py", line 188, in __call__
    result = meth(request, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/maasserver/api/support.py", line 310, in dispatch
    return function(self, request, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/maasserver/api/machines.py", line 1616, in allocate
    raise ValueError()
ValueError

2018-04-19 14:53:27 regiond: [info] 127.0.0.1 POST /MAAS/api/2.0/machines/?op=allocate HTTP/1.1 --> 500 INTERNAL_SERVER_ERROR (referrer
: -; agent: Python-httplib2/0.9.2 (gzip))

summary: - [2.4] INTERNAL_SERVER_ERROR has no logs in /var/log/maas/*.log
+ [2.4] INTERNAL_SERVER_ERROR on various operations without crash logs and
+ seemingly continue to work
summary: - [2.4] INTERNAL_SERVER_ERROR on various operations without crash logs and
- seemingly continue to work
+ [2.4] INTERNAL_SERVER_ERROR on various operations without crash logs,
+ which seemingly continue to work (e.g. release, allocate)
tags: added: cdo-qa cdo-qa-blocker foundations-engine
description: updated
Changed in maas:
assignee: nobody → Blake Rouse (blake-rouse)
Chris Gregan (cgregan) on 2018-04-19
tags: added: cdo-release-blocker
Changed in maas:
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: 2.4.0rc1 → 2.4.0beta3
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers