MAAS 2.8 production mode sometimes loses connection when finished downloading an image
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Expired
|
Undecided
|
Unassigned |
Bug Description
Sometimes, when running MAAS 2.8 in production mode in a lxd container, MAAS 2.8/candidate blanks the lower screen and shows a "Connection lost, reconnecting...." Sometimes doing a "maas status" will run for a very long time and brings the connection back. Usually happens right after the first image finishes downloading, after installation and configuration.
In these cases, "maas status" eventually returns:
unix://
"snap stop maas" and "snap restart maas" eventually return:
error: cannot communicate with server: timeout exceeded while waiting for response
"ps -ef | grep maas" shows only these items:
root@maas-2-8-t2:~# ps -ef | grep maas
root 1177 1 0 22:14 ? 00:00:00 snapfuse /var/lib/
root 1387 1 0 22:16 ? 00:10:21 snapfuse /var/lib/
root 20528 380 0 22:38 ? 00:00:00 grep --color=auto maas
root@maas-2-8-t2:~#
"reboot"-ing the container takes a very long time to run, and does not reboot the container, it remains STOPPED.
attempting to restart the container with "lxc start" produces this error:
Error: Common start logic: saving config file for the container failed
Try `lxc info --show-log maas-2-8-t2` for more info
"lxc info --show-log maas-2-8-t2" produces exactly this, not including the separator line below:
Name: maas-2-8-t2
Location: none
Remote: unix://
Architecture: x86_64
Created: 2020/06/04 22:12 UTC
Status: Stopped
Type: container
Profiles: maas
Log:
----
"lxc start maas-2-8-t2 --debug" produces the following long output, not including the separator line below:
DBUG[06-
DBUG[06-
DBUG[06-
DBUG[06-
{
"config": {},
"api_extensions": [
"storage_
"container_
"container_
"container_
"auth_pki",
"container_
"etag",
"patch",
"usb_devices",
"https_
"image_
"directory_
"container_
"storage_
"storage_
"network",
"profile_
"container_
"container_
"certificate
"container_
"gpu_devices",
"container_
"migration_
"id_map",
"network_
"network_
"storage",
"file_delete",
"file_append",
"network_
"storage_
"storage_
"network_vlan",
"image_
"container_
"container_
"storage_
"unix_
"storage_
"storage_
"network_
"storage_
"entity_
"image_
"storage_
"id_map_base",
"file_symlinks",
"container_
"network_
"storage_
"container_
"container_
"storage_
"storage_
"resource_
"storage_
"storage_
"storage_
"resources",
"kernel_limits",
"storage_
"macaroon_
"network_sriov",
"console",
"restrict_
"migration_
"infiniband",
"maas_network",
"devlxd_events",
"proxy",
"network_
"file_
"network_
"unix_
"storage_
"operation_
"clustering",
"event_
"storage_
"nvidia_
"container_
"container_
"devlxd_images",
"container_
"proxy_unix",
"proxy_udp",
"clustering_
"proxy_
"network_state",
"proxy_
"container_
"unix_
"pprof_http",
"proxy_
"network_
"proxy_nat",
"network_
"container_
"candid_
"backup_
"candid_config",
"nvidia_
"storage_
"storage_
"projects",
"candid_
"network_
"container_
"usb_
"snapshot_
"container_
"clustering_
"clustering_
"container_
"snapshot_
"container_
"snapshot_
"network_
"resources_
"resources_gpu",
"resources_
"kernel_
"id_
"event_
"storage_
"network_
"container_
"rbac",
"cluster_
"seccomp_
"lxc_features",
"container_
"network_
"storage_
"container_
"resources_v2",
"container_
"container_
"container_
"storage_
"resources_
"daemon_
"instances",
"image_types",
"resources_
"clustering_
"images_expiry",
"resources_
"backup_
"ceph_
"container_
"compression
"container_
"container_
"container_
"container_
"virtual-
"image_
"clustering_
"resources_
"storage_
"vm_
"unix_
"api_filtering",
"instance_
"clustering_
"firewall_
"projects_
"container_
"limits_
"container_
"projects_
"custom_
"volume_
"trust_
"snapshot_
"clustering_
"container_
"container_
"resources_
"resources_
"resources_
"api_os",
"container_
"container_
"container_
"resources_
"images_
],
"api_status": "stable",
"api_version": "1.0",
"auth": "trusted",
"public": false,
"auth_methods": [
"tls"
],
"environment": {
"addresses": [],
"architectures": [
"x86_64",
"i686"
],
"certificate": "-----BEGIN CERTIFICATE-
"certificate
"driver": "lxc",
"driver_
"firewall": "xtables",
"kernel": "Linux",
"kernel_
"kernel_
"netnsid_
"seccomp_
"seccomp_
"shiftfs": "false",
"uevent_
"unpriv_
},
"kernel_
"lxc_features": {
"cgroup2": "true",
"mount_
"network_
"network_
"network_
"network_
"network_
"pidfd": "true",
"seccomp_
},
"os_name": "Ubuntu",
"os_version": "19.10",
"project": "default",
"server": "lxd",
"server_
"server_name": "stormrider-yoga",
"server_pid": 2773,
"server_
"storage": "zfs",
"storage_
}
}
DBUG[06-
DBUG[06-
DBUG[06-
{
"architecture": "x86_64",
"config": {
"image.
"image.
"image.label": "release",
"image.os": "ubuntu",
"image.release": "bionic",
"image.serial": "20191114",
"image.type": "squashfs",
"image.version": "18.04",
"volatile.
"volatile.
"volatile.
"volatile.
"volatile.
"volatile.
"volatile.
"volatile.
"volatile.
"volatile.
"volatile.
},
"devices": {},
"ephemeral": false,
"profiles": [
"maas"
],
"stateful": false,
"description": "",
"created_at": "2020-06-
"expanded_
"image.
"image.
"image.label": "release",
"image.os": "ubuntu",
"image.release": "bionic",
"image.serial": "20191114",
"image.type": "squashfs",
"image.version": "18.04",
"volatile.
"volatile.
"volatile.
"volatile.
"volatile.
"volatile.
"volatile.
"volatile.
"volatile.
"volatile.
"volatile.
},
"expanded_
"eth0": {
"name": "eth0",
"nictype": "bridged",
"parent": "lxdbr0",
"type": "nic"
},
"root": {
"path": "/",
"pool": "default",
"type": "disk"
},
"virbr0": {
"nictype": "bridged",
"parent": "virbr0",
"type": "nic"
}
},
"name": "maas-2-8-t2",
"status": "Stopped",
"status_code": 102,
"last_used_at": "2020-06-
"location": "none",
"type": "container"
}
DBUG[06-
DBUG[06-
DBUG[06-
{
"action": "start",
"timeout": 0,
"force": false,
"stateful": false
}
DBUG[06-
DBUG[06-
{
"id": "8391e635-
"class": "task",
"description": "Starting container",
"created_at": "2020-06-
"updated_at": "2020-06-
"status": "Running",
"status_code": 103,
"resources": {
"containers": [
"/1.
]
},
"metadata": null,
"may_cancel": false,
"err": "",
"location": "none"
}
DBUG[06-
DBUG[06-
DBUG[06-
{
"id": "8391e635-
"class": "task",
"description": "Starting container",
"created_at": "2020-06-
"updated_at": "2020-06-
"status": "Running",
"status_code": 103,
"resources": {
"containers": [
"/1.
]
},
"metadata": null,
"may_cancel": false,
"err": "",
"location": "none"
}
Error: Common start logic: saving config file for the container failed
Try `lxc info --show-log maas-2-8-t2` for more info
----
only solution is to delete the container and try a new one.
i have one currently working production container, and one production container in the failed state listed just above.
description: | updated |
description: | updated |
Changed in maas: | |
status: | New → Triaged |
Note: I have determined that the issue with container restarts is related to drive space associated with lxd. By deleting enough other containers, I can return to a running state, though I should note that I don't have a lot of containers (about 6 or 7) running at any given time. The other issues seem consistent, especially needing to run "maas status" to get the MAAS connecting again when it drops.