Using exact internal replica of archive.ubuntu.com, commission phase fails to complete due to failure to pull apt packages

Bug #1851276 reported by Jim Conner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Triaged
Medium
Unassigned
2.6
Invalid
Medium
Unassigned
maas-ui
New
Unknown

Bug Description

Our maas environment is 2.6.1

I will give context into the environment in order to assist any engineers' attempts at reproducing this bug.

The environment was first brought up with full egress Internet connectivity. All machines in the node pool were initially commissioned and deployed using externally hosted default Ubuntu repos to us.archive.ubuntu.com.

Months later, a project required this lab to be air-gapped. All egress connectivity to the Internet was blocked.

We realized that the package repository and the deploy/commission of machines were tightly coupled, which we really didn't know until egress was turned down.

During our various tests to get an internal repo properly mirrored, I stumbled into a bug where failed deploying machines set to rescue mode would not easily exit rescue mode. At first, I deleted machines (currently two of them) to get them out of rescue mode until I found/learned a couple tricks to remove them from rescue mode.

So now I have two machines which require adding to the node pool again. This is where things get harry. However, I should mention that we now have a fully replicated apt repo in our environment which is an identical replica of archive.ubuntu.com (rsynced).

Currently:

  * maas setup to use internal apt repo; an identical replica of archive.ubuntu.com
  * maas 2.6.1
  * existing commissioned machines are deployed and released continually with no errors.
  * only two machines need to be re-added to maas and
  * the two machines needing to be recommissioned fail to commission 100% of the time.

If I were to offer an observation or two of what the root cause might be:
  1. either apt update is not happening in the commission stage prior to apt install or
  2. the failed call to the api.snapcraft.io is volatile and we need to mirror api.snapcraft.io in our air-gapped environment.

Snippet of what I've been able to capture for the failure:

{code}stateengine.go:102: state ensure error: Get https://api.snapcraft.io/api/v1/snaps/sections: net/http: request canceled while waiting fo
r connection (Client.Timeout exceeded while awaiting headers){code}

and possibly

{code}2019-10-25T22:38:12+00:00 maas-enlisting-node cloud-init[2128]: Generating locales (this might take a while)...
2019-10-25T22:38:13+00:00 maas-enlisting-node cloud-init[2128]: en_US.UTF-8... done
2019-10-25T22:38:13+00:00 maas-enlisting-node cloud-init[2128]: Generation complete.
2019-10-25T22:38:13+00:00 maas-enlisting-node cloud-init[2128]: Cloud-init v. 18.5-45-g3554ffe8-0ubuntu1~16.04.1 running 'modules:config' at Fri, 25 Oct 2019 22:38:12 +0000. Up 64.78 seconds.
2019-10-25T22:38:13+00:00 maas-enlisting-node systemd[1]: Started Apply the settings specified in cloud-config.
2019-10-25T22:38:13+00:00 maas-enlisting-node systemd[1]: Starting Execute cloud user/final scripts...
2019-10-25T22:38:14+00:00 maas-enlisting-node cloud-init[2202]: Hit:1 http://archive.ubuntu.com/ubuntu xenial InRelease
2019-10-25T22:38:15+00:00 maas-enlisting-node cloud-init[2202]: Reading package lists...
2019-10-25T22:38:15+00:00 maas-enlisting-node cloud-init[2202]: Reading package lists...
2019-10-25T22:38:15+00:00 maas-enlisting-node cloud-init[2202]: Building dependency tree...
2019-10-25T22:38:15+00:00 maas-enlisting-node cloud-init[2202]: Reading state information...
2019-10-25T22:38:15+00:00 maas-enlisting-node cloud-init[2202]: Package ipmitool is not available, but is referred to by another package.
2019-10-25T22:38:15+00:00 maas-enlisting-node cloud-init[2202]: This may mean that the package is missing, has been obsoleted, or
2019-10-25T22:38:15+00:00 maas-enlisting-node cloud-init[2202]: is only available from another source
2019-10-25T22:38:15+00:00 maas-enlisting-node cloud-init[2202]: E: Package 'ipmitool' has no installation candidate
2019-10-25T22:38:15+00:00 maas-enlisting-node cloud-init[2202]: E: Unable to locate package sshpass
2019-10-25T22:38:15+00:00 maas-enlisting-node cloud-init[2202]: E: Unable to locate package jq
2019-10-25T22:38:15+00:00 maas-enlisting-node cloud-init[2202]: Cloud-init v. 18.5-45-g3554ffe8-0ubuntu1~16.04.1 running 'modules:final' at Fri, 25 Oct 2019 22:38:14 +0000. Up 66.65 seconds.
2019-10-25T22:38:15+00:00 maas-enlisting-node cloud-init[2202]: 2019-10-25 22:38:15,360 - util.py[WARNING]: Failed to install packages: ['python3-yaml', 'python3-oauthlib', 'freeipmi-tools', 'ip
mitool', 'sshpass', 'archdetect-deb', 'jq']
2019-10-25T22:38:15+00:00 maas-enlisting-node cloud-init[2202]: 2019-10-25 22:38:15,370 - cc_package_update_upgrade_install.py[WARNING]: 1 failed with exceptions, re-raising the last one
2019-10-25T22:38:15+00:00 maas-enlisting-node cloud-init[2202]: 2019-10-25 22:38:15,371 - util.py[WARNING]: Running module package-update-upgrade-install (<module 'cloudinit.config.cc_package_up{code}

which lead to:

{code}2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: Traceback (most recent call last):
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: File "/tmp/user_data.sh.at67mX/bin/maas-ipmi-autodetect-tool", line 57, in <module>
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: main()
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: File "/tmp/user_data.sh.at67mX/bin/maas-ipmi-autodetect-tool", line 50, in main
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: if is_host_moonshot():
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: File "/tmp/user_data.sh.at67mX/bin/maas-ipmi-autodetect-tool", line 36, in is_host_moonshot
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: output = subprocess.check_output(['ipmitool', 'raw', '06', '01'])
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: **kwargs).stdout
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: File "/usr/lib/python3.5/subprocess.py", line 693, in run
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: with Popen(*popenargs, **kwargs) as process:
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: restore_signals, start_new_session)
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: File "/usr/lib/python3.5/subprocess.py", line 1551, in _execute_child
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: raise child_exception_type(errno_num, err_msg)
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: FileNotFoundError: [Errno 2] No such file or directory: 'ipmitool'
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: % Total % Received % Xferd Average Speed Time Time Time Current
2019-10-25T22:38:16+00:00 maas-enlisting-node cloud-init[2202]: Dload Upload Total Spent Left Speed
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: #015 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0#015100 988 100 638 100 350 2334 1280 --:--:-- --:--:-- --:--:-- 2336
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: /var/lib/cloud/instance/scripts/user_data.sh: line 192: jq: command not found
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: /var/lib/cloud/instance/scripts/user_data.sh: line 200: jq: command not found
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: Traceback (most recent call last):
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: File "/tmp/user_data.sh.at67mX/bin/maas-signal", line 105, in <module>
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: main()
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: File "/tmp/user_data.sh.at67mX/bin/maas-signal", line 84, in main
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: read_config(args.config, creds)
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: File "/tmp/user_data.sh.at67mX/bin/maas_api_helper.py", line 196, in read_config
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: if 'datasource' in cfg:
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: TypeError: argument of type 'NoneType' is not iterable
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: Traceback (most recent call last):
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: File "/tmp/user_data.sh.at67mX/bin/maas-run-remote-scripts", line 790, in <module>
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: main()
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: File "/tmp/user_data.sh.at67mX/bin/maas-run-remote-scripts", line 736, in main
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: read_config(args.config, creds)
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: File "/tmp/user_data.sh.at67mX/bin/maas_api_helper.py", line 196, in read_config
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: if 'datasource' in cfg:
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: TypeError: argument of type 'NoneType' is not iterable
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: 2019-10-25 22:38:17,402 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/user_data.sh [1]
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: 2019-10-25 22:38:17,403 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2019-10-25T22:38:17+00:00 maas-enlisting-node cloud-init[2202]: 2019-10-25 22:38:17,404 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed
{code}

Tags: ui
Revision history for this message
Alberto Donato (ack) wrote :

Hi, how did you configure MAAS to point to your mirror?
Looking at the logs, the machines being deployed are still trying to reach archive.ubuntu.com.

Changed in maas:
status: New → Incomplete
Revision history for this message
Jim Conner (snafuxnj) wrote :
Download full text (16.7 KiB)

The output in the original report is older. Here's the most recent repo settings and also output from attempted commissioning:

```
$ maas <PROFILE> package-repositories read
Success.
Machine-readable output follows:
[
    {
        "name": "main_archive",
        "url": "http://apt.cnct.io:8080/ubuntu/",
        "distributions": [],
        "disabled_pockets": [],
        "disabled_components": [],
        "disable_sources": true,
        "components": [],
        "arches": [
            "amd64"
        ],
        "key": "",
        "enabled": true,
        "id": 1,
        "resource_uri": "/MAAS/api/2.0/package-repositories/1/"
    },
    {
        "name": "ports_archive",
        "url": "http://apt.cnct.io:8080/ubuntu-ports/",
        "distributions": [],
        "disabled_pockets": [
            "backports",
            "security",
            "updates"
        ],
        "disabled_components": [
            "restricted",
            "universe",
            "multiverse"
        ],
        "disable_sources": true,
        "components": [],
        "arches": [
            "armhf",
            "arm64",
            "ppc64el",
            "s390x"
        ],
        "key": "",
        "enabled": true,
        "id": 2,
        "resource_uri": "/MAAS/api/2.0/package-repositories/2/"
    }
]
```

Logs taken from the maas-enlisting-node log:

```
2019-11-05T19:56:44+00:00 maas-enlisting-node snapd[1730]: stateengine.go:102: state ensure error: Get https://api.snapcraft.io/api/v1/snaps/sections: net/http: request canceled while waiting for
connection (Client.Timeout exceeded while awaiting headers)
2019-11-05T19:56:44+00:00 maas-enlisting-node systemd[1]: Started Wait until snapd is fully seeded.
2019-11-05T19:56:44+00:00 maas-enlisting-node systemd[1]: Starting Apply the settings specified in cloud-config...
2019-11-05T19:56:44+00:00 maas-enlisting-node systemd[1]: Reached target Multi-User System.
2019-11-05T19:56:44+00:00 maas-enlisting-node systemd[1]: Reached target Graphical Interface.
2019-11-05T19:56:44+00:00 maas-enlisting-node systemd[1]: Started Stop ureadahead data collection 45s after completed startup.
2019-11-05T19:56:44+00:00 maas-enlisting-node systemd[1]: Starting Update UTMP about System Runlevel Changes...
2019-11-05T19:56:44+00:00 maas-enlisting-node systemd[1]: Started Update UTMP about System Runlevel Changes.
2019-11-05T19:56:45+00:00 maas-enlisting-node cloud-init[2149]: ...

Revision history for this message
Jim Conner (snafuxnj) wrote :
Download full text (6.8 KiB)

I managed to get a commissioning node to get past enlistment so I could commission with the `allow ssh` option.

I noted a couple of interesting errors from cloud-init logs:

```
2019-11-05 20:51:47,862 - util.py[DEBUG]: Running command ['/var/lib/cloud/instance/scripts/runcmd'] with allowed return codes [0] (shell=False, capture=False)
2019-11-05 20:51:47,870 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/runcmd [127]
2019-11-05 20:51:47,870 - util.py[DEBUG]: Failed running /var/lib/cloud/instance/scripts/runcmd [127]
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 870, in runparts
    subp(prefix + [exe_path], capture=False)
  File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2065, in subp
    cmd=args)
cloudinit.util.ProcessExecutionError: Unexpected error while running command.
Command: ['/var/lib/cloud/instance/scripts/runcmd']
Exit code: 127
Reason: -
Stdout: -
Stderr: -
2019-11-05 20:51:47,877 - util.py[DEBUG]: Running command ['/var/lib/cloud/instance/scripts/user_data.sh'] with allowed return codes [0] (shell=False, capture=False)
2019-11-05 20:52:59,003 - cc_scripts_user.py[WARNING]: Failed to r...

Read more...

Revision history for this message
Jim Conner (snafuxnj) wrote :

See attached logs...

Revision history for this message
Alberto Donato (ack) wrote :

what error do you see in the UI when commissioning the machine?

Also could you please attach full logs from /var/log/maas?

It seems you're currently using xenial as commissioning series. Do you get similar errors using bionic?

Changed in maas:
status: Incomplete → New
Revision history for this message
Björn Tillenius (bjornt) wrote :

Thanks for the cloud-init logs. I did find some minor bugs in MAAS by looking at them. But even after reproducing those errors in the logs, the machine still commissioned ok, and I was able to use it.

So, I will need some more information. The first is the exact steps you do to reproduce the issue, e.g. how you add the machine to MAAS, what actions you take on it until you commission it.

Then I also need to know what errors you see in the MAAS UI. The output from 'maas $profile machine read $systemid' would be good.

Also, it seems that you have Xenial set as the commissioning environment. Could you go to the MAAS settings and change that to Bionic? That might help.

I also need all the logs from /var/log/maas from the machine where MAAS is installed.

For the MAAS developers, there's a minor bug in that when a machine doesn't have any disks, we generate network configuration for the node. This shouldn't be done on commmissioning, and it will also break on xenial deployments, since netplan isn't installed by default there. Yet the vendor data we produce tries to run 'netplan apply'. See src/metadataserver/vendor_data.py.

Changed in maas:
status: New → Incomplete
Revision history for this message
Jim Conner (snafuxnj) wrote :

Yup! I noticed that yesterday after I was able to finagle a commissioning machine to remain on and ssh-able.

OK, I'll get you all the information and try your suggestion. Also, I noticed yesterday when the machine I just talked about finally got into a running state, that `apt` did indeed update before attempting to install ipmitools et al. However, the output above clearly shows that an update is not being performed prior to installation attempts.

At any rate, let me try the things you're suggesting and I'll get the information to you ASAP.

Revision history for this message
Jim Conner (snafuxnj) wrote :

NT

Revision history for this message
Jim Conner (snafuxnj) wrote :

OK, prior to today, using the xenial as the default commission image, the steps I took to commission a machine...

actually instead of trying to explain how I did it, I recorded a screencast to show you how I'm doing it. Please see attached screencast.

Revision history for this message
Jim Conner (snafuxnj) wrote :

I apologize for the first part of my last comment. That whole first sentence can be ignored.

Revision history for this message
Lee Trager (ltrager) wrote :

I'm having trouble duplicating this as well. When MAAS generates the vendor data for cloud-init MAAS includes what repositories to use. This configuration is applied before any package is installed. If you can reproduce the issue what is the output of

curl http://$MAAS_REGION:5240/MAAS/metadata/latest/enlist-preseed/?op=get_enlist_preseed
curl http://$MAAS_REGION:5240/MAAS/metadata/latest/by-id/$SYSTEM_ID/?op=get_preseed

I see the netplan apply bug and have filed it separately in LP:1851622.

Revision history for this message
Jim Conner (snafuxnj) wrote :

I can reproduce it 100% of the time, but I don't know from where you want me to run the `curl`s. Here's the output from the controller:

```
$ curl http://192.168.2.1:5240/MAAS/metadata/latest/enlist-preseed/?op=get_enlist_preseed
#cloud-config
apt:
  preserve_sources_list: false
  primary:
  - arches: [amd64, i386]
    uri: http://apt.cnct.io:8080/ubuntu/
  - arches: [default]
    uri: http://apt.cnct.io:8080/ubuntu-ports/
  proxy: http://192-168-2-0--24.maas-internal:8000/
  security:
  - arches: [amd64, i386]
    uri: http://apt.cnct.io:8080/ubuntu/
  - arches: [default]
    uri: http://apt.cnct.io:8080/ubuntu-ports/
  sources_list: 'deb $PRIMARY $RELEASE main

    # deb-src $PRIMARY $RELEASE main

    '
datasource:
  MAAS: {metadata_url: 'http://192.168.2.1:5240/MAAS/metadata/enlist'}
manage_etc_hosts: true
packages: [python3-yaml, python3-oauthlib, freeipmi-tools, ipmitool, sshpass, archdetect-deb,
  jq]
power_state: {condition: test ! -e /tmp/block-poweroff, delay: now, mode: poweroff,
  timeout: 1800}
rsyslog:
  remotes: {maas: '192.168.2.1:5247'}
```

The packages specified are the packages that fail to install, which I surmise is happening because cloud-init scripts are not running `apt update` before attempting to install.

Revision history for this message
Jim Conner (snafuxnj) wrote :

```
2019-11-07T21:21:52+00:00 maas-enlisting-node cloud-init[1928]: Hit:1 http://apt.cnct.io:8080/ubuntu bionic InRelease
2019-11-07T21:21:53+00:00 maas-enlisting-node cloud-init[1928]: Reading package lists...
2019-11-07T21:21:53+00:00 maas-enlisting-node cloud-init[1928]: Reading package lists...
2019-11-07T21:21:53+00:00 maas-enlisting-node cloud-init[1928]: Building dependency tree...
2019-11-07T21:21:53+00:00 maas-enlisting-node cloud-init[1928]: Reading state information...
2019-11-07T21:21:53+00:00 maas-enlisting-node cloud-init[1928]: Package ipmitool is not available, but is referred to by another package.
2019-11-07T21:21:53+00:00 maas-enlisting-node cloud-init[1928]: This may mean that the package is missing, has been obsoleted, or
2019-11-07T21:21:53+00:00 maas-enlisting-node cloud-init[1928]: is only available from another source
2019-11-07T21:21:53+00:00 maas-enlisting-node cloud-init[1928]: E: Package 'ipmitool' has no installation candidate
2019-11-07T21:21:53+00:00 maas-enlisting-node cloud-init[1928]: E: Unable to locate package sshpass
2019-11-07T21:21:53+00:00 maas-enlisting-node cloud-init[1928]: E: Unable to locate package jq
```

Revision history for this message
Jim Conner (snafuxnj) wrote :

```
$ curl http://192.168.2.1:5240/MAAS/metadata/latest/by-id/agh3ec/?op=get_preseed
#cloud-config
apt: {preserve_sources_list: false, proxy: 'http://192-168-2-0--24.maas-internal:8000/',
  sources_list: 'deb http://apt.cnct.io:8080/ubuntu/ $RELEASE restricted main multiverse
    universe

    # deb-src http://apt.cnct.io:8080/ubuntu/ $RELEASE restricted main multiverse
    universe

    deb http://apt.cnct.io:8080/ubuntu/ $RELEASE-updates restricted main multiverse
    universe

    # deb-src http://apt.cnct.io:8080/ubuntu/ $RELEASE-updates restricted main multiverse
    universe

    deb http://apt.cnct.io:8080/ubuntu/ $RELEASE-security restricted main multiverse
    universe

    # deb-src http://apt.cnct.io:8080/ubuntu/ $RELEASE-security restricted main multiverse
    universe

    deb http://apt.cnct.io:8080/ubuntu/ $RELEASE-backports restricted main multiverse
    universe

    # deb-src http://apt.cnct.io:8080/ubuntu/ $RELEASE-backports restricted main multiverse
    universe

    '}
datasource:
  MAAS: {consumer_key: <REDACTED>, metadata_url: 'http://192.168.2.1:5240/MAAS/metadata/',
    token_key: <REDACTED>, token_secret: <REDACTED>}
manage_etc_hosts: true
packages: [python3-yaml, python3-oauthlib, freeipmi-tools, ipmitool, sshpass]
reporting:
  maas: {consumer_key: <REDACTED>, endpoint: 'http://192.168.2.1:5240/MAAS/metadata/status/agh3ec',
    token_key: <REDACTED>, token_secret: <REDACTED>,
    type: webhook}
rsyslog:
  remotes: {maas: 'None:5247'}
```

Revision history for this message
Jim Conner (snafuxnj) wrote :

Some steps I took today which yielded interesting results.

  * revert to xenial commission image

Now, moving on:

  * maas Delete machine
  * power machine on (via remote console)
  * it starts to auto-commission -- no IPMI credentials yet
  * commission fails (same problem mentioned in ticket -- apt update doesn't happen),
    * commission gives auto-generated hostname
  * host halts, maas UI thinks commission is still happening
  * do the following:
    * maas UI: abort commission
    * maas UI: host shows in `New` state
    * maas UI: click `New` machine -> configuration -> set IPMI stuffs
    * maas UI: commission host, set allow SSH - it will fail. However, apt-get update runs
    * Once commission is finished, though failed, one can ssh into the host now.

Now looking at `/var/lib/cloud/instance/scripts/user_data.sh` on the broken host, there is code around determining if `apt-get update` should run or not. This is very confusing. Why not just let `apt-get update` run indiscriminately? Apt is smart enough to know whether the pkgcache db is up to date or not. This seems to be the thing that is causing my problem, I surmise.

base: /usr/lib/python3/dist-packages/metadataserver

Looking at: user_data/templates/snippets/tests/test_maas_run_remote_scripts.py
line 275 function: test_install_dependencies_runs_apt_get_update_when_required() (test: seems inert)

and user_data/templates/snippets/maas_run_remote_scripts.py
line: 142 function: _install_apt_dependencies()

Revision history for this message
Jim Conner (snafuxnj) wrote :

the steps above are not 100% accurate. I've created screencasts that show exactly what's going on in the UI as well as the console.

machine 1 - Xenial commission image
https://drive.google.com/open?id=1qORygPxy50mm9yezRI43MJwnlQoeCJ5F
https://drive.google.com/open?id=1y-1DLHzUmwAs_iYpKCp3zCup-SciNKta

machine 2 - Bionic commission image
https://drive.google.com/open?id=1QJaV4qzGKXhjj7Hw7xRr-gleF5vQ34e6
https://drive.google.com/open?id=1QO-Vfsfhv8LH49Op-3WMEMQV4Wn6J7T_

See attached latest logs, which correlate with the vids above.

Revision history for this message
Lee Trager (ltrager) wrote :

It looks like you disabled a number of pockets and components. Can you enable them? Your package-repository config should look like this with the url replaced by your mirror

$ maas $PROFILE package-repositories read
Success.
Machine-readable output follows:
[
    {
        "name": "ports_archive",
        "url": "http://ports.ubuntu.com/ubuntu-ports",
        "distributions": [],
        "disabled_pockets": [],
        "disabled_components": [],
        "disable_sources": true,
        "components": [],
        "arches": [
            "armhf",
            "arm64",
            "ppc64el",
            "s390x"
        ],
        "key": "",
        "enabled": true,
        "id": 2,
        "resource_uri": "/MAAS/api/2.0/package-repositories/2/"
    },
    {
        "name": "main_archive",
        "url": "http://archive.ubuntu.com/ubuntu",
        "distributions": [],
        "disabled_pockets": [],
        "disabled_components": [],
        "disable_sources": true,
        "components": [],
        "arches": [
            "amd64",
            "i386"
        ],
        "key": "",
        "enabled": true,
        "id": 1,
        "resource_uri": "/MAAS/api/2.0/package-repositories/1/"
    }
]

Revision history for this message
Lee Trager (ltrager) wrote :

WRT your other question. Your failures are happening when cloud-init tries to install a set of packages required for maas_run_remote_scripts to run. It appears that only main is getting configured and the packages MAAS needs are from updates and universe. Because of that failure maas_run_remote_scripts never runs.

The reason maas_run_remote_scripts checks if apt cache is available before running apt update is so its only done once and only when its needed. Apt checks if the cache is out of date based on the system time, if NTP isn't configured this could be wrong so cloud-init/MAAS needs to ensure this happens before any package is installed. Once its done we don't want it to happen again as that will cause additional load on the apt server and network traffic.

Revision history for this message
Jim Conner (snafuxnj) wrote :

Our current (most latest repositories)
```
$ maas $PROFILE package-repositories read
Success.
Machine-readable output follows:
[
    {
        "name": "main_archive",
        "url": "http://apt.cnct.io:8080/ubuntu/",
        "distributions": [],
        "disabled_pockets": [],
        "disabled_components": [],
        "disable_sources": true,
        "components": [],
        "arches": [
            "amd64"
        ],
        "key": "",
        "enabled": true,
        "id": 1,
        "resource_uri": "/MAAS/api/2.0/package-repositories/1/"
    },
    {
        "name": "ports_archive",
        "url": "http://apt.cnct.io:8080/ubuntu-ports/",
        "distributions": [],
        "disabled_pockets": [
            "backports",
            "security",
            "updates"
        ],
        "disabled_components": [
            "restricted",
            "universe",
            "multiverse"
        ],
        "disable_sources": true,
        "components": [],
        "arches": [
            "armhf",
            "arm64",
            "ppc64el",
            "s390x"
        ],
        "key": "",
        "enabled": true,
        "id": 2,
        "resource_uri": "/MAAS/api/2.0/package-repositories/2/"
    }
]
```

Revision history for this message
Jim Conner (snafuxnj) wrote :

OK. I apologize. I was attempting to run a test over cell connection and that wasn't the best connection for a remote console connection.

So I just re-ran the test over a better Internet connection. I can now confirm you were correct that the ports compartments and buckets being disabled were in fact the problem.

OK, so that leads me to some questions about how to be sure others don't run into this problem.

  Objective:
  * Attempting to get maas to function properly in an air-gapped environment

  System Requirements:
  * An *exact* replica of https://archive.ubuntu.com/ubuntu *at least*. Use `rsync` to grab replica:
    rsync -avv --info=progress2,all3 rsync://archive.ubuntu.com/ubuntu /path/to/local/repo-base
  * properly set up webserver to serve up repo
  * naturally your egress FW rules will require port 873 to the addresses above to be open in order to rsync.
  * Attempting to create an internal apt repo differing from the repo above using something like apt-mirror or the like will require a bit of massaging and will likely be error prone.

  MaaS Requirements:
  * CHANGE ONLY THE FIRST TWO PRIMARY REPOSITORY ENDPOINT **URLS**! Both must be changed to match your internal address. Leave everything else alone!!!! *DO NOT CHANGE ANYTHING ELSE*
    - anything differing will likely cause issues with enlisting/commissioning/deploying
  * Obviously any additional repositories maintained besides the main two repos will likely not be problematic.

For me, the local apt repository has been a true thorn in my side during implementation and testing. While I understand the necessity for the tight coupledness, none of this was documented.

One thing that was not tested was leaving the two primary addresses alone and adding an additional internal repo with the necessary tooling (not documented to the best of my knowledge).

---------------------

I think now that this issue is confirmed to not necessarily be a bug (however, very unintuitive), it was definitely less than ideal and at least requires documentation.

Lee Trager (ltrager)
Changed in maas:
status: Incomplete → Confirmed
importance: Undecided → Medium
milestone: none → 2.7.0alpha1
Revision history for this message
Lee Trager (ltrager) wrote :

Thanks for confirming that. I agree our docs could be better and MAAS should give some warnings. MAAS allows users to disable components and pockets as its possible for a user to create their own big repository with everything they need.

@design - MAAS should show a warning if the universe component is disabled. As seen here if using a mirror this breaks commissioning. The UI should also show a warning if updates or security pocket is disabled. It would also be nice to show a warning if there isn't a repository for every architecture. That may be a bit more difficult as we split architectures between amd64/i386 and everything else.

@docs - We should add all of this to our documentation.

tags: added: ui
Changed in maas:
milestone: 2.7.0b1 → 2.7.0b2
Changed in maas:
status: Confirmed → New
Revision history for this message
Lilyana Videnova (lilyanavidenova) wrote :
Changed in maas:
status: New → Triaged
Changed in maas:
milestone: 2.7.0b2 → 2.7.0rc1
Changed in maas:
milestone: 2.7.0rc1 → 2.7.0b3
Changed in maas-ui:
status: Unknown → New
Revision history for this message
Adam Collard (adam-collard) wrote :

Since this is not a regression, I'm bumping it out of the 2.7.0 release

Changed in maas:
milestone: 2.7.0b3 → none
Changed in maas-ui:
status: New → Fix Released
Changed in maas-ui:
status: Fix Released → Unknown
Changed in maas-ui:
status: Unknown → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.