Kubernetes-core on lxd-cluster breaks lxd database connection
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Charmed Kubernetes Bundles |
Triaged
|
Medium
|
Unassigned |
Bug Description
I deployed the kubernetes-core bundle on a lxd cluster. My lxd cluster is composed of 2 nodes, and is set up with a fan network.
When I deploy the kubernetes-core bundle on top of it, at some point, something breaks the lxd database connection. Juju isn't able to connect to it anymore, it does not respond to simple commands:
```
ubuntu@kubecon04:~$ sudo lxc info
Error: cannot fetch node config from database: driver: bad connection
```
The status of the service seems fine. But if I restart it it does not load anymore
```
ubuntu@kubecon04:~$ systemctl status snap.lxd.
Failed to dump process list, ignoring: No such file or directory
● snap.lxd.
Loaded: loaded (/etc/systemd/
Active: active (running) since Mon 2019-11-11 21:07:19 UTC; 40min ago
Listen: /var/snap/
CGroup: /system.
ubuntu@kubecon04:~$ systemctl restart snap.lxd.
Failed to restart snap.lxd.
See system logs and 'systemctl status snap.lxd.
ubuntu@kubecon04:~$ sudo systemctl restart snap.lxd.
Job for snap.lxd.
See "systemctl status snap.lxd.
ubuntu@kubecon04:~$
ubuntu@kubecon04:~$
ubuntu@kubecon04:~$ systemctl status snap.lxd.
● snap.lxd.
Loaded: loaded (/etc/systemd/
Active: inactive (dead) since Mon 2019-11-11 21:48:07 UTC; 2s ago
Listen: /var/snap/
```
Here's a dump of the lxd database.
```
ubuntu@kubecon04:~$ sudo sqlite3 /var/snap/
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE schema (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
version INTEGER NOT NULL,
updated_at DATETIME NOT NULL,
UNIQUE (version)
);
INSERT INTO schema VALUES(
CREATE TABLE config (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
key VARCHAR(255) NOT NULL,
value TEXT,
UNIQUE (key)
);
INSERT INTO config VALUES(
INSERT INTO config VALUES(
CREATE TABLE patches (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
name VARCHAR(255) NOT NULL,
applied_at DATETIME NOT NULL,
UNIQUE (name)
);
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
INSERT INTO patches VALUES(
CREATE TABLE raft_nodes (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
address TEXT NOT NULL,
UNIQUE (address)
);
INSERT INTO raft_nodes VALUES(
INSERT INTO raft_nodes VALUES(
DELETE FROM sqlite_sequence;
INSERT INTO sqlite_sequence VALUES('schema',1);
INSERT INTO sqlite_sequence VALUES(
INSERT INTO sqlite_sequence VALUES('config',3);
INSERT INTO sqlite_sequence VALUES(
COMMIT;
```
juju status
```
ubuntu@kubecon04:~$ juju status
Model Controller Cloud/Region Version SLA Timestamp
kubernetes-core lxd-remote-default lxd-remote/default 2.6.10 unsupported 21:51:15Z
App Version Status Scale Charm Store Rev OS Notes
containerd waiting 0 containerd jujucharms 33 ubuntu
easyrsa 3.0.1 active 1 easyrsa jujucharms 278 ubuntu
etcd 3.2.10 active 1 etcd jujucharms 460 ubuntu
flannel waiting 0 flannel jujucharms 450 ubuntu
kubernetes-master 1.16.2 waiting 0/1 kubernetes-master jujucharms 754 ubuntu exposed
kubernetes-worker 1.16.2 blocked 0/1 kubernetes-worker jujucharms 590 ubuntu exposed
Unit Workload Agent Machine Public address Ports Message
easyrsa/0* active idle 0/lxd/0 10.98.184.17 Certificate Authority connected.
etcd/0* active idle 0 240.67.178.105 2379/tcp Healthy with 1 known peer
kubernetes-
kubernetes-
Machine State DNS Inst id Series AZ Message
0 started 240.67.178.105 juju-4aa30b-0 bionic Running
0/lxd/0 started 10.98.184.17 juju-4aa30b-0-lxd-0 bionic Container started
1 started 240.67.242.56 juju-4aa30b-1 bionic Running
```
I will join the debug-log in the first comment of this bug.
From the lxd debug command, I get this error:
```
EROR[11-
: address already in use"
```
Destroying the model doesn't work either...
```
ubuntu@kubecon04:~$ juju destroy-model kubernetes-core
WARNING! This command will destroy the "kubernetes-core" model.
This includes all machines, applications, data and other resources.
Continue [y/N]? y
Destroying modelUnable to get the model status from the API: getting storage provider registry: Get https:/
```
This might be more a lxd bug than a kubernetes bug, but in any case it is an incompatibility between the two of them.
Well the upload of the debug-log keeps failing in launchpad, so here's short snippet of it (the end)
``` py:42:joined: client easyrsa. py:231: message easyrsa. py:240: send_ca easyrsa. py:277: publish_ global_ client_ cert easyrsa. py:90:set_ easyrsa_ version 0.juju- log Invoking reactive handler: reactive/ easyrsa. py:90:set_ easyrsa_ version 0.juju- log Invoking reactive handler: reactive/ easyrsa. py:231: message 0.juju- log Invoking reactive handler: reactive/ easyrsa. py:240: send_ca -master- 0: 21:31:34 ERROR juju.worker.uniter resolver loop error: getting storage provider registry: cannot fetch node config from database: driver: bad connection -master- 0: 21:31:34 ERROR juju.worker. dependency "uniter" manifold worker returned unexpected error: getting storage provider registry: cannot fetch node config from database: driver: bad connection 0.juju- log Invoking reactive handler: reactive/ easyrsa. py:277: publish_ global_ client_ cert 0.juju- log Invoking reactive handler: hooks/relations /tls-certificat es/provides. py:42:joined: client 0.juju- log Invoking reactive handler: reactive/ etcd.py: 429:process_ snapd_timer 0.juju- log Get config refresh.timer for snap core 0.juju- log Invoking reactive handler: hooks/relations /tls-certificat es/requires. py:79:joined: certificates -master- 0: 21:33:02 ERROR juju.worker.uniter resolver loop error: getting storage provider registry: cannot fetch node config from database: driver: bad connection -master- 0: 21:33:02 ERROR juju.worker. dependency "uniter" manifold worker returned unexpected error: getting storage provider registry: cannot fetch node config from database: driver: bad connection -worker- 0: 21:33:05 ERROR juju.worker.uniter resolver loop error: getting storage provider registry: cannot fetch node config from database: driver: bad connection -worker- 0: 21:33:05 ERROR juju.worker. dependency "uniter" manifold worker returned unexpected error: getting storage provider registry: cannot fetch node config from database: driver: bad connection -master- 0: 21:34:46 ERROR juju.worker.uniter resolver loop error: getting storage provider registry: cannot fetch node config from database: driver: bad connection -master- 0: 21:34:46 ERROR juju.worker. dependency "uniter" manifold worker returned unexpected error: getting storage provider registry: cannot fetch node config from database: driver: bad connection -worker- 0: 21:35:00 ERROR juju.worker.uniter resolver loop error: getting storage provider registry: cannot fetch node config from database: driver: bad connection -worker- 0: 21:35:00 ERROR juju.worker. dependency "uniter" man...
/provides.
tracer: ++ queue handler reactive/
tracer: ++ queue handler reactive/
tracer: ++ queue handler reactive/
tracer: ++ queue handler reactive/
unit-easyrsa-0: 21:31:33 INFO unit.easyrsa/
unit-easyrsa-0: 21:31:33 INFO unit.easyrsa/
unit-easyrsa-0: 21:31:33 INFO unit.easyrsa/
unit-kubernetes
unit-kubernetes
unit-easyrsa-0: 21:31:34 INFO unit.easyrsa/
unit-easyrsa-0: 21:31:35 INFO unit.easyrsa/
unit-etcd-0: 21:31:35 INFO unit.etcd/
unit-etcd-0: 21:31:36 INFO unit.etcd/
unit-etcd-0: 21:31:36 INFO unit.etcd/
unit-kubernetes
unit-kubernetes
unit-kubernetes
unit-kubernetes
unit-kubernetes
unit-kubernetes
unit-kubernetes
unit-kubernetes