Bug #1865439 “Juju 2.8-beta1.3273 cannot deploy/remove applicati...” : Bugs : Canonical Juju

Barry Price (barryprice) on 2020-03-02

description:

updated

Revision history for this message

Ian Booth (wallyworld) wrote on 2020-03-03:

#1

Do we have any debug-logs?
This is not an issue we tend to see and having logs that would help identify any issue would help understand what's happening.

tags:

added: k8s

Revision history for this message

Barry Price (barryprice) wrote on 2020-03-04:

#2

Unfortunately it seems to stop logging at the time of the upgrade.

The controller pod is then destroyed and replaced, and nothing shows up in logs (even at --level DEBUG) for the controller after that. Here's the full log from the controller (upgrade started at 11:58:25):

https://paste.ubuntu.com/p/DzNBddrfB9/

In other model logs, post-upgrade we see only ERROR lines and all controller functionality is lost (again, with the upgrade performed at 11:58:25):

https://paste.ubuntu.com/p/FhFp95jCRr/

There's no /var/log/juju directory on the controller pod, so I'm not sure where to look for on-disk logs. The 2.7 controller pod is destroyed very quickly upon upgrade, so any useful info stored locally there is going to be difficult to retrieve.

Revision history for this message

Ian Booth (wallyworld) wrote on 2020-03-19:

#3

There is a /var/log/juju with logs. eg assume the controller is called "foo"

$ kubectl -n controller-foo exec -ti controller-0 -c api-server bash
$ ls /var/log/juju/
audit.log lease.log machine-lock.log

The controller pod has 2 containers - one for mongo and one for the controller agent. You need to specify which container you want to exec into (either mongodb or api-server).

Given there's a number of possible causes here, and the beta is evolving daily, and we haven't had any other reports of similar issues, I'll go ahead and mark this as Incomplete. But please re-open with any extra info if it happens again.

Changed in juju:
status:	New → Incomplete

Revision history for this message

Barry Price (barryprice) wrote on 2020-03-19:

#4

Ah, I was obviously on the wrong container, apologies.

From 2.7.4 a "juju-upgrade" on the controller now puts me onto 2.8-beta1.3350, but I can still reproduce.

I cannot upgrade models once on this version (LP:1867224), hence this is happening on a 2.7.4 model that existed before the controller upgrade - but I can try to repeat the experiment, but use a fresh 2.8 beta model instead if that's any use:

A deploy command appears to execute without error, but watching 'juju status' shows no progress from:

Model Controller Cloud/Region Version SLA Timestamp
wptest myk8s-localhost myk8s/localhost 2.7.4 unsupported 22:35:28+07:00

App Version Status Scale Charm Store Rev OS Address Notes
wordpress waiting 0/1 wordpress-k8s local 0 kubernetes agent initializing

Unit Workload Agent Address Ports Message
wordpress/0 waiting allocating agent initializing

After a while I gave up and attempted to destroy-model:

$ juju destroy-model wptest -y
Destroying model
Waiting for model to be removed, 1 application(s)...............................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
..............................................................ERROR timeout after 30m0s timeout
$

Unfortunately there's not much in the way of logs on the beta controller container:

root@controller-0:/var/log/juju# ls -lart
total 304
drwxr-xr-x 1 root root 4096 Mar 19 14:30 ..
-rw-r----- 1 syslog adm 0 Mar 19 14:30 machine-lock.log
drw-r--r-- 2 root root 4096 Mar 19 14:30 .
-rw-r----- 1 syslog adm 298128 Mar 19 15:04 audit.log
root@controller-0:/var/log/juju#

Here's a paste of audit.log:

https://paste.ubuntu.com/p/vvBzBGqGKT/

Ah, I was obviously on the wrong container, apologies.

From 2.7.4 a "juju-upgrade" on the controller now puts me onto 2.8-beta1.3350, but I can still reproduce.

I cannot upgrade models once on this version (LP:1867224), hence this is happening on a 2.7.4 model that existed before the controller upgrade - but I can try to repeat the experiment, but use a fresh 2.8 beta model instead if that's any use:

A deploy command appears to execute without error, but watching 'juju status' shows no progress from:

Model   Controller       Cloud/Region     Version  SLA          Timestamp
wptest  myk8s-localhost  myk8s/localhost  2.7.4    unsupported  22:35:28+07:00

App        Version  Status   Scale  Charm          Store  Rev  OS          Address  Notes
wordpress           waiting    0/1  wordpress-k8s  local    0  kubernetes           agent initializing

Unit         Workload  Agent       Address  Ports  Message
wordpress/0  waiting   allocating                  agent initializing

After a while I gave up and attempted to destroy-model:

$ juju destroy-model wptest -y
Destroying model
Waiting for model to be removed, 1 application(s)...............................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
..............................................................ERROR timeout after 30m0s timeout
$

Unfortunately there's not much in the way of logs on the beta controller container:

root@controller-0:/var/log/juju# ls -lart
total 304
drwxr-xr-x 1 root   root   4096 Mar 19 14:30 ..
-rw-r----- 1 syslog adm       0 Mar 19 14:30 machine-lock.log
drw-r--r-- 2 root   root   4096 Mar 19 14:30 .
-rw-r----- 1 syslog adm  298128 Mar 19 15:04 audit.log
root@controller-0:/var/log/juju#

Here's a paste of audit.log:

https://paste.ubuntu.com/p/vvBzBGqGKT/

Changed in juju:
status:	Incomplete → New

Revision history for this message

Barry Price (barryprice) wrote on 2020-03-19:

#5

Sorry, previous paste was a double-paste.

Here's a single one:

https://paste.ubuntu.com/p/gf6MkMM5ny/

Revision history for this message

Ian Booth (wallyworld) wrote on 2020-07-16:

#6

When this happens, what we really need is the output of juju dump-model (both of the model being destroyed and the controller model, with any secrets redacted).
That will show what is stuck in the dying state and hence holding up graceful removal of the model.
The controller logs around the time of destroy would also help.

Can we get an example of that info to enable us to work on diagnosing what is happening? Dos --force work? Using --force bypasses Juju's expectation that stuff will shutdown cleanly and it eventually just removes model entities regardless.

There's usually 2 root causes - storage not being detached / volumes removed, or unis not leaving scope because relation departed/broken hooks are not completed successfully.

tags:

added: destroy-model

Pen Gale (pengale) on 2020-08-27

Changed in juju:
status:	New → Incomplete

Revision history for this message

Launchpad Janitor (janitor) wrote on 2020-10-27:

#7

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status:	Incomplete → Expired

Canonical Juju

Juju 2.8-beta1.3273 cannot deploy/remove applications

Bug Description

Other bug subscribers

Remote bug watches