[2.3] 500 error from maas metadata while running hardware tests

Bug #1747548 reported by Brad Marshall
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
MAAS
Expired
Undecided
Unassigned

Bug Description

I'm trying to deploy multiple servers with 24+ drives. During the hardware tests I'm seeing the attempted posts to the MAAS metadata server getting 500 errors as per:

Feb 5 23:37:53 hostname cloud-init[3382]: request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 2.: HTTP Error 500: Internal Server Error
Feb 5 23:37:53 hostname cloud-init[3382]: request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 2.: HTTP Error 500: Internal Server Error
Feb 5 23:37:53 hostname cloud-init[3382]: request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 4.: HTTP Error 500: Internal Server Error
Feb 5 23:37:53 hostname cloud-init[3382]: message repeated 12 times: [ request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 4.: HTTP Error 500: Internal Server Error]
Feb 5 23:37:53 hostname cloud-init[3382]: request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 1.: HTTP Error 500: Internal Server Error
Feb 5 23:37:53 hostname cloud-init[3382]: request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 2.: HTTP Error 500: Internal Server Error
Feb 5 23:37:53 hostname cloud-init[3382]: request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 1.: HTTP Error 500: Internal Server Error
Feb 5 23:37:53 hostname cloud-init[3382]: request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 4.: HTTP Error 500: Internal Server Error
Feb 5 23:37:53 hostname cloud-init[3382]: request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 8.: HTTP Error 500: Internal Server Error
Feb 5 23:37:53 hostname cloud-init[3382]: message repeated 10 times: [ request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 8.: HTTP Error 500: Internal Server Error]
Feb 5 23:37:53 hostname cloud-init[3382]: request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 1.: HTTP Error 500: Internal Server Error
Feb 5 23:37:53 hostname cloud-init[3382]: request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 16.: HTTP Error 500: Internal Server Error
Feb 5 23:37:53 hostname cloud-init[3382]: message repeated 6 times: [ request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 16.: HTTP Error 500: Internal Server Error]
Feb 5 23:37:53 hostname cloud-init[3382]: request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 1.: HTTP Error 500: Internal Server Error
Feb 5 23:37:53 hostname cloud-init[3382]: request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 16.: HTTP Error 500: Internal Server Error
Feb 5 23:37:53 hostname cloud-init[3382]: request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 1.: HTTP Error 500: Internal Server Error
Feb 5 23:37:53 hostname cloud-init[3382]: request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 32.: HTTP Error 500: Internal Server Error
Feb 5 23:37:54 hostname cloud-init[3382]: message repeated 5 times: [ request to http://x.y.z.2:5240/MAAS/metadata//2012-03-01/ failed. sleeping 32.: HTTP Error 500: Internal Server Error]
Feb 5 23:37:54 hostname cloud-init[3382]: FAIL: HTTP error [500]FAIL: HTTP error [500]FAIL: HTTP error [500]FAIL: HTTP error [500]FAIL: HTTP error [500]FAIL: HTTP error [500]Downloading and extracting http://x.y.z.2:5240/MAAS/metadata//2012-03-01//maas-scripts/ to /tmp/user_data.sh.ZNgBwi/scripts

This is causing the drives to fail hardware validation, as they are unable to complete in time. I can't find any details in the logs as to what is going on to cause the 500 errors.

How can I stop this from occurring, or debug what is going on here?

Please let me know if you need any further information.

$ dpkg-query -W maas
maas 2.3.0-6434-gd354690-0ubuntu1~17.10.1

$ lsb_release -d
Description: Ubuntu 17.10

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Brad,

Can you confirm that your storage devices have been discovered successfully? Please attach the following:

1. all commissioning script output
2. all maas logs
3. confirm whether the disks were discovered during commissioning
4. how many nodes at the time are you commissioning?

Changed in maas:
status: New → Incomplete
milestone: none → 2.4.x
summary: - 500 error from maas metadata while running hardware tests
+ [2.3] 500 error from maas metadata while running hardware tests
Revision history for this message
Brad Marshall (brad-marshall) wrote :

> Can you confirm that your storage devices have been discovered successfully?

Yes, MAAS is seeing all the expected drives.

> 1. all commissioning script output
> 2. all maas logs

I've attached the cloud-init-output.log from the server being commissioned, as well as an extract of the regiond.log. The whole maas dir is about 100M, so I don't see much point in attaching it all, please let me know if you need any more specific files.

> 3. confirm whether the disks were discovered during commissioning

All disks were discovered during commissioning.

> 4. how many nodes at the time are you commissioning?

While testing I'm doing one at a time.

Revision history for this message
Brad Marshall (brad-marshall) wrote :

Attaching cloud-init-output.log file for commissioned machine.

Changed in maas:
status: Incomplete → New
Changed in maas:
importance: Undecided → High
Revision history for this message
Brad Marshall (brad-marshall) wrote :

I've also noticed if I run it a few times it sometimes eventually completes, so I'm wondering if this is some kind of timing issue, or race condition or something.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Brad,

Are you still running into this issue ? Have yo, by any change, tried MAAS 2.3.2 ?

Thanks!

Changed in maas:
status: New → Incomplete
milestone: 2.4.x → 2.4.0rc1
Changed in maas:
milestone: 2.4.0rc1 → 2.4.0rc2
Revision history for this message
Chris Johnston (cjohnston) wrote :

I'm seeing this on 2.5.2

[ 101.502292] cloud-init[879]: request to http://10.10.0.10:5248/MAAS/metadata//2012-03-01/ failed. sleeping 1.: HTTP Error 500: Internal Server Error
[ 101.510192] cloud-init[879]: request to http://10.10.0.10:5248/MAAS/metadata//2012-03-01/ failed. sleeping 1.: HTTP Error 500: Internal Server Error
[ 101.514535] cloud-init[879]: request to http://10.10.0.10:5248/MAAS/metadata//2012-03-01/ failed. sleeping 2.: HTTP Error 500: Internal Server Error
[ 101.518098] cloud-init[879]: request to http://10.10.0.10:5248/MAAS/metadata//2012-03-01/ failed. sleeping 4.: HTTP Error 500: Internal Server Error
[ 101.521207] cloud-init[879]: request to http://10.10.0.10:5248/MAAS/metadata//2012-03-01/ failed. sleeping 8.: HTTP Error 500: Internal Server Error
[ 101.524125] cloud-init[879]: request to http://10.10.0.10:5248/MAAS/metadata//2012-03-01/ failed. sleeping 16.: HTTP Error 500: Internal Server Error
[ 101.527232] cloud-init[879]: request to http://10.10.0.10:5248/MAAS/metadata//2012-03-01/ failed. sleeping 32.: HTTP Error 500: Internal Server Error
[ 221.444867] cloud-init[879]: FAIL: HTTP error [500]request to http://10.10.0.10:5248/MAAS/metadata//2012-03-01/ failed. sleeping 1.: HTTP Error 500: Internal Server Error
[ 221.460416] cloud-init[879]: request to http://10.10.0.10:5248/MAAS/metadata//2012-03-01/ failed. sleeping 1.: HTTP Error 500: Internal Server Error
[ 221.465171] cloud-init[879]: request to http://10.10.0.10:5248/MAAS/metadata//2012-03-01/ failed. sleeping 2.: HTTP Error 500: Internal Server Error
[ 221.472798] cloud-init[879]: request to http://10.10.0.10:5248/MAAS/metadata//2012-03-01/ failed. sleeping 4.: HTTP Error 500: Internal Server Error
[ 221.479724] cloud-init[879]: request to http://10.10.0.10:5248/MAAS/metadata//2012-03-01/ failed. sleeping 8.: HTTP Error 500: Internal Server Error
[ 221.489743] cloud-init[879]: request to http://10.10.0.10:5248/MAAS/metadata//2012-03-01/ failed. sleeping 16.: HTTP Error 500: Internal Server Error
[ 221.498090] cloud-init[879]: request to http://10.10.0.10:5248/MAAS/metadata//2012-03-01/ failed. sleeping 32.: HTTP Error 500: Internal Server Error

Revision history for this message
Marek Grudzinski (ivve) wrote :

Hi, I'm seeing this on 2.6.0 during deployment of 1 machine (at a time).

Revision history for this message
Blake Rouse (blake-rouse) wrote :

@ivve - please provide the regiond.log that will show the traceback to help diagnose the issue.

Revision history for this message
Björn Tillenius (bjornt) wrote :

It's not possible to see what's going on with the attached regiond.log.

Could you please reproduce this with debug turned on, and upload all the MAAS logs?

Changed in maas:
status: Incomplete → New
importance: High → Undecided
milestone: 2.4.0rc2 → none
status: New → Incomplete
Revision history for this message
Georg Schreiber (gschreiber2020) wrote :

Hello,
I've recently installed snap maas 2.7 and encounter the same problem:
2020-03-28T13:23:50+00:00 S4 cloud-init[1090]: Failed installing package(s) for smartctl-validate (id: 98, script_version_id: 1)
2020-03-28T13:24:55+00:00 S4 cloud-init[1090]: request to http://192-168-1-0--24.maas-internal:5248/MAAS/metadata/2012-03-01/ failed. sleeping 1.: HTTP Error 500: Internal Server Error
2020-03-28T13:24:55+00:00 S4 cloud-init[1090]: request to http://192-168-1-0--24.maas-internal:5248/MAAS/metadata/2012-03-01/ failed. sleeping 1.: HTTP Error 500: Internal Server Error

Some lines before, systemd-resovled complains:
2020-03-28T13:23:39+00:00 S4 systemd-resolved[635]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.

I attach regiond.log, rackd.log (because regiond complains not to be able to contact rack)
And if you tell me to turn debug on, I'm happy to provide you with the debug"abled" logs, as well.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.