MAAS boot resource import hangs for ~25 minutes from API call to actual image import

Bug #1899981 reported by Michael Skalka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
High
Adam Collard
2.8
Invalid
Undecided
Unassigned

Bug Description

# snap list
Name Version Rev Tracking Publisher Notes
core18 20200724 1885 latest/stable canonical✓ base
maas 2.8.2-8577-g.a3e674063 8980 2.8/stable canonical✓ -
maas-cli 0.6.5 13 latest/stable canonical✓ -
snapd 2.47 9607 latest/stable canonical✓ snapd

Using a 3-node MAAS cluster stood up according to the documentation for HA'd snap MAAS.

From the MAAS VIP holder's `/var/snap/maas/common/log/regiond.log`:

2020-10-15 12:49:06 regiond: [info] 10.244.40.31 POST /MAAS/api/2.0/boot-resources/?op=import HTTP/1.1 --> 200 OK (referrer: -; agent: Python-httplib2/0.9.2 (gzip))

On the same unit's `/var/snap/maas/common/maas.log`:

2020-10-15T13:13:46+00:00 leafeon maas.import-images: [info] Downloading image descriptions from http://images.maas.io/ephemeral-v3/daily/
2020-10-15T13:13:46+00:00 leafeon maas.import-images: [info] Region downloading image descriptions from 'http://images.maas.io/ephemeral-v3/daily/'.
2020-10-15T13:13:51+00:00 leafeon maas.bootsources: [info] Updated boot sources cache.
2020-10-15T13:13:51+00:00 leafeon maas.bootresources: [info] Started importing of boot images from 1 source(s).
2020-10-15T13:13:51+00:00 leafeon maas.import-images: [info] Downloading image descriptions from http://images.maas.io/ephemeral-v3/daily/
2020-10-15T13:13:51+00:00 leafeon maas.import-images: [info] Region downloading image descriptions from 'http://images.maas.io/ephemeral-v3/daily/'.
2020-10-15T13:13:59+00:00 leafeon maas.bootresources: [info] Importing images from source: http://images.maas.io/ephemeral-v3/daily/
2020-10-15T13:14:10+00:00 leafeon maas.bootresources: [warn] Ignoring unsupported filetype(manifest) from com.ubuntu.maas.daily:v3:boot:18.04:amd64:ga-18.04 20201007
2020-10-15T13:14:10+00:00 leafeon maas.bootresources: [warn] Ignoring unsupported filetype(manifest) from com.ubuntu.maas.daily:v3:boot:18.04:amd64:ga-18.04-lowlatency 20201007
2020-10-15T13:14:10+00:00 leafeon maas.bootresources: [warn] Ignoring unsupported filetype(manifest) from com.ubuntu.maas.daily:v3:boot:18.04:amd64:hwe-18.04 20201007
2020-10-15T13:14:10+00:00 leafeon maas.bootresources: [warn] Ignoring unsupported filetype(manifest) from com.ubuntu.maas.daily:v3:boot:18.04:amd64:hwe-18.04-edge 20201007
2020-10-15T13:14:10+00:00 leafeon maas.bootresources: [warn] Ignoring unsupported filetype(manifest) from com.ubuntu.maas.daily:v3:boot:18.04:amd64:hwe-18.04-lowlatency 20201007
2020-10-15T13:14:10+00:00 leafeon maas.bootresources: [warn] Ignoring unsupported filetype(manifest) from com.ubuntu.maas.daily:v3:boot:18.04:amd64:hwe-18.04-lowlatency-edge 20201007
2020-10-15T13:14:10+00:00 leafeon maas.bootresources: [warn] Ignoring unsupported filetype(manifest) from com.ubuntu.maas.daily:v3:boot:20.04:amd64:ga-20.04 20201005
2020-10-15T13:14:10+00:00 leafeon maas.bootresources: [warn] Ignoring unsupported filetype(manifest) from com.ubuntu.maas.daily:v3:boot:20.04:amd64:ga-20.04-lowlatency 20201005

So the API call comes in and ~25 minutes later the MAAS units catch up and start importing the images. Nothing in the rackd logs indicate anything went wrong:

2020-10-15 12:48:58 Uninitialized: [info] ClusterClient connection established (HOST:IPv6Address(TCP, '::ffff:10.245.209.20', 52350) PEER:IPv6Address(TCP, '::ffff:10.245.209.40', 5250))
2020-10-15 12:49:02 provisioningserver.rpc.clusterservice: [info] Fully connected to all 12 event-loops on all 3 region controllers (leafeon, meinfoo, swoobat).
2020-10-15 12:49:05 provisioningserver.rpc.clusterservice: [info] Event-loop 'leafeon:pid=8868' authenticated.
2020-10-15 12:49:05 provisioningserver.rpc.clusterservice: [info] Rack controller 'nw8bgq' registered (via leafeon:pid=8868) with MAAS version 2.8.2-8577-g.a3e674063.
2020-10-15 12:51:47 provisioningserver.rackdservices.dhcp_probe_service: [info] Probe for external DHCP servers started on interfaces: eno49, broam, brinternal.
2020-10-15 12:51:57 provisioningserver.dhcp.detect: [info] External DHCP server(s) discovered on interface 'eno49': 10.245.208.5
2020-10-15 12:52:17 provisioningserver.rackdservices.dhcp_probe_service: [info] External DHCP probe complete.
2020-10-15 13:01:47 provisioningserver.rackdservices.dhcp_probe_service: [info] Probe for external DHCP servers started on interfaces: eno49, broam, brinternal.
2020-10-15 13:01:57 provisioningserver.dhcp.detect: [info] External DHCP server(s) discovered on interface 'eno49': 10.245.208.5
2020-10-15 13:02:17 provisioningserver.rackdservices.dhcp_probe_service: [info] External DHCP probe complete.
2020-10-15 13:11:47 provisioningserver.rackdservices.dhcp_probe_service: [info] Probe for external DHCP servers started on interfaces: eno49, broam, brinternal.
2020-10-15 13:11:57 provisioningserver.dhcp.detect: [info] External DHCP server(s) discovered on interface 'eno49': 10.245.208.5
2020-10-15 13:12:17 provisioningserver.rackdservices.dhcp_probe_service: [info] External DHCP probe complete.
2020-10-15 13:21:47 provisioningserver.rackdservices.dhcp_probe_service: [info] Probe for external DHCP servers started on interfaces: eno49, broam, brinternal.
2020-10-15 13:21:57 provisioningserver.dhcp.detect: [info] External DHCP server(s) discovered on interface 'eno49': 10.245.208.5
2020-10-15 13:22:17 provisioningserver.rackdservices.dhcp_probe_service: [info] External DHCP probe complete.
2020-10-15 13:31:47 provisioningserver.rackdservices.dhcp_probe_service: [info] Probe for external DHCP servers started on interfaces: eno49, broam, brinternal.
2020-10-15 13:31:57 provisioningserver.dhcp.detect: [info] External DHCP server(s) discovered on interface 'eno49': 10.245.208.5
2020-10-15 13:32:17 provisioningserver.rackdservices.dhcp_probe_service: [info] External DHCP probe complete.

So this is internal MAAS communication hanging on /something/.

Tags: cdo-qa
Revision history for this message
Michael Skalka (mskalka) wrote :

Subbed field-high as this is blocking ongoing SQA testing.

tags: added: cdo-qa
summary: - [2.8] MAAS boot resource import hangs for ~25 minutes from API call to
- actual image import
+ MAAS boot resource import hangs for ~25 minutes from API call to actual
+ image import
Changed in maas:
assignee: nobody → Adam Collard (adam-collard)
importance: Undecided → High
Revision history for this message
Adam Collard (adam-collard) wrote :

Interestingly there was some download occurring before 12:49 (see lines 28-32 of https://pastebin.canonical.com/p/mpVdHQ8hC5/ )

Revision history for this message
Michael Skalka (mskalka) wrote :

Marked invalid and un-subbed. Root-caused to a networking issue with images.maas.io

Changed in maas:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.