Can't bring up GPU worker, docker daemon fail.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Kubernetes Worker Charm |
Fix Released
|
Undecided
|
Joseph Borg |
Bug Description
Hi,
First thank you so much for creating this awesome charm!! I have successfully deploy it on prem with my DELL R720s around 6 months ago.
However, when I trying to rebuild it once again this time with cs:bundle/
From "juju debug-log --replay --include kubernetes-
------
unit-kubernetes
unit-kubernetes
...
...
unit-kubernetes
unit-kubernetes
unit-kubernetes
unit-kubernetes
unit-kubernetes
unit-kubernetes
unit-kubernetes
unit-kubernetes
unit-kubernetes
unit-kubernetes
------
From "root@Bubnicki:~# journalctl -xe -u docker"
------
Jan 22 17:34:30 Bubnicki dockerd[40216]: unable to configure the Docker daemon with file /etc/docker/
Jan 22 17:34:30 Bubnicki systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
------
Seems like a conflict between the system docker startup file and the docker daemon.json file but I can't tell which one.
Below are the daemon.json and startup config file. Thank you so much for your help!
------
root@Bubnicki:~# cat /etc/docker/
{"runtimes": {"nvidia": {"path": "nvidia-
------
------
root@Bubnicki:~# cat /lib/systemd/
[Unit]
Description=Docker Application Container Engine
Documentation=https:/
After=network.
Requires=
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
EnvironmentFile
ExecStart=
ExecReload=
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
[Install]
WantedBy=
------
Best,
Yen
Fix the problem with following steps.
In worker node container- runtime ce=18.06. 0~ce~3- 0~ubuntu container- runtime= 2.0.0+docker18. 06.0-1 docker2= 2.0.3+docker18. 06.0-1
#apt-get remove nvidia-docker2
#apt-get remove nvidia-
#apt-get remove docker-ce
#apt-get install docker-
#apt-get install nvidia-
#apt-get install nvidia-
#pkill -SIGHUP dockerd
#docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
On MAAS
#juju resolved kubernetes-worker/3