'max-logs-age' panic on bootstrap with lxd provider

Bug #1688635 reported by Mike McCracken
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Ian Booth

Bug Description

Bootstrap failed - timing out with 'unable to contact api server' - on a LXD provider.

Ubuntu version 16.04
LXD version 2.13 (from http//ppa.launchpad.net/ubuntu-lxc/lxd-stable/ubuntu)
Juju version 2.2 beta3 (a recent fork from the juju/develop branch)

The LXD container was brought up, and looking at the machine-0.log, it ends with the following panic:

panic: empty value for "max-logs-age" found in configuration (type <nil>, val <nil>)

goroutine 424 [running]:
github.com/juju/juju/controller.Config.mustString(0xc42076cb10, 0x25f2d48, 0xc, 0x0, 0x0)
        /tmp/tmpGWjOtu/juju-core_2.2-beta3/src/github.com/juju/juju/controller/config.go:191 +0x1f0
github.com/juju/juju/controller.Config.MaxLogsAge(0xc42076cb10, 0xc42076cb10)
        /tmp/tmpGWjOtu/juju-core_2.2-beta3/src/github.com/juju/juju/controller/config.go:290 +0x40
github.com/juju/juju/worker/dblogpruner.(*pruneWorker).loop(0xc420628350, 0xc4204554a0, 0x0, 0x0)
        /tmp/tmpGWjOtu/juju-core_2.2-beta3/src/github.com/juju/juju/worker/dblogpruner/worker.go:78 +0x3cd
github.com/juju/juju/worker/dblogpruner.(*pruneWorker).(github.com/juju/juju/worker/dblogpruner.loop)-fm(0xc4204554a0, 0xc4204554a0, 0xc4207b6180)
        /tmp/tmpGWjOtu/juju-core_2.2-beta3/src/github.com/juju/juju/worker/dblogpruner/worker.go:42 +0x34
github.com/juju/juju/worker.NewSimpleWorker.func1(0xc4207b6180, 0xc420628370)
        /tmp/tmpGWjOtu/juju-core_2.2-beta3/src/github.com/juju/juju/worker/simpleworker.go:23 +0x6a
created by github.com/juju/juju/worker.NewSimpleWorker
        /tmp/tmpGWjOtu/juju-core_2.2-beta3/src/github.com/juju/juju/worker/simpleworker.go:24 +0x5e

Revision history for this message
Ian Booth (wallyworld) wrote :

This seems very much like a mismatch between source and agent binary. It's possible for that to happen when the source used to compile the bootstrap client is older than tip and it picks up an incompatible binary from streams. This happens semi-regularly with forks which then go stale.

I can't reproduce this at all from tip of devel, which btw current reports 2.2 beta4.

Could you see if it's reproducible from develop tip; ie rebase your fork.

Changed in juju:
status: New → Incomplete
Revision history for this message
Felipe Reyes (freyes) wrote :

I'm using juju 2.2-beta4 from the snap (latest/beta), when trying to restore a controller from a backup, jujud crashed with this same error:

panic: empty value for "max-logs-age" found in configuration (type <nil>, val <nil>)

goroutine 424 [running]:
github.com/juju/juju/controller.Config.mustString(0xc4206f8750, 0x261ce31, 0xc, 0x0, 0x0)
        /tmp/tmpBWTQVa/juju-core_2.2-beta4/src/github.com/juju/juju/controller/config.go:191 +0x1f0
github.com/juju/juju/controller.Config.MaxLogsAge(0xc4206f8750, 0xc4206f8750)
        /tmp/tmpBWTQVa/juju-core_2.2-beta4/src/github.com/juju/juju/controller/config.go:290 +0x40
github.com/juju/juju/worker/dblogpruner.(*pruneWorker).loop(0xc4206ff000, 0xc420751da0, 0x0, 0x0)
        /tmp/tmpBWTQVa/juju-core_2.2-beta4/src/github.com/juju/juju/worker/dblogpruner/worker.go:78 +0x3cd
github.com/juju/juju/worker/dblogpruner.(*pruneWorker).(github.com/juju/juju/worker/dblogpruner.loop)-fm(0xc420751da0, 0xc420751da0, 0xc420789920)
        /tmp/tmpBWTQVa/juju-core_2.2-beta4/src/github.com/juju/juju/worker/dblogpruner/worker.go:42 +0x34
github.com/juju/juju/worker.NewSimpleWorker.func1(0xc420789920, 0xc4206ff010)
        /tmp/tmpBWTQVa/juju-core_2.2-beta4/src/github.com/juju/juju/worker/simpleworker.go:23 +0x6a
created by github.com/juju/juju/worker.NewSimpleWorker
        /tmp/tmpBWTQVa/juju-core_2.2-beta4/src/github.com/juju/juju/worker/simpleworker.go:24 +0x5e

This is the output on the client side:

$ juju restore-backup -b -m controller --file test.tar.gz
Looking for packaged Juju agent version 2.2-beta4 for amd64
To configure your system to better support LXD containers, please see: https://github.com/lxc/lxd/blob/master/doc/production-setup.md
Launching controller instance(s) on snowspeeder/snowspeeder...
 - juju-ec8b3d-0 (arch=amd64)
Juju GUI installation has been disabled
Waiting for address
Attempting to connect to 192.168.10.17:22
Bootstrap agent now started
Contacting Juju controller at 192.168.10.17 to verify accessibility...
ERROR unable to contact api server after 30 attempts: codec.ReadHeader error: error receiving message: read tcp 192.168.88.240:49970->192.168.10.17:17070: read: connection reset by peer

A copy of the backup used can be found at http://people.canonical.com/~freyes/test.tar.gz

Changed in juju:
status: Incomplete → New
Revision history for this message
Ian Booth (wallyworld) wrote :

A workaround is to first restore the backup with the same version of Juju client used to create the backup. Then run upgrade-juju to bring the controller up to the latest version. The upgrade will ensure the database is migrated to match the latest Juju.

Revision history for this message
Felipe Reyes (freyes) wrote :

Ian, this is the sequence of versions and operations that this deployment has lived, it's my long standing (more than 1 month) development environment, I just do add-model/destroy-model there.

- Initial bootstrap was made using 2.1 (or maybe even 2.0) from ppa:juju/stable
- Upgraded client to 2.1.2 (from ppa:juju/stable) then `juju upgrade-juju ...`
- Removed the juju's deb package and installed the snap from latest/stable (2.1.3)
- juju upgrade-juju, the controller is running 2.1.3 now
- juju model-config -m controller agent-stream=devel
- snap refresh --classic --beta juju
- juju sync-tools
- juju upgrade-juju --agent-version 2.2-beta4
- juju create-backup -f test.tar.gz
- lxc delete --force juju-xxxx # the controller container
- juju restore-backup -b -m controller --file test.tar.gz

So during the backup creation everything was 2.2-beta4 and during the recovery too.

I will try this set of steps with a fresh bootstrap:

- bootstrap 2.1.3
- upgrade to beta4
- create backup
- lxc delete juju-xxx #controller container
- juju restore-backup -b -m controller --file test.tar.gz

Revision history for this message
Felipe Reyes (freyes) wrote :
Download full text (9.2 KiB)

Ian,

Here it's the full log of how I reproduced this from scratch

 ~ $ sudo snap refresh --stable --classic juju
juju 2.1.3 from 'canonical' refreshed
 ~ $ juju --version
2.1.3-zesty-amd64
 ~ $ juju bootstrap snowspeeder snowspeeder # this a remote lxd server
Creating Juju controller "snowspeeder" on snowspeeder/snowspeeder
Looking for packaged Juju agent version 2.1.3 for amd64
To configure your system to better support LXD containers, please see: https://github.com/lxc/lxd/blob/master/doc/production-setup.md
Launching controller instance(s) on snowspeeder/snowspeeder...
 - juju-908f67-0 (arch=amd64)
Fetching Juju GUI 2.7.2
Waiting for address
Attempting to connect to 192.168.10.16:22
Warning: Permanently added '192.168.10.16' (ECDSA) to the list of known hosts.
Logging to /var/log/cloud-init-output.log on the bootstrap machine
Running apt-get update
Running apt-get upgrade
Installing curl, cpu-checker, bridge-utils, cloud-utils, tmux
Fetching Juju agent version 2.1.3 for amd64
Installing Juju machine agent
Starting Juju machine agent (service jujud-machine-0)
Bootstrap agent now started
Contacting Juju controller at 192.168.10.16 to verify accessibility...
Bootstrap complete, "snowspeeder" controller now available.
Controller machines are in the "controller" model.
Initial model "default" added.
 ~ $ juju deploy -n 20 cs:ubuntu
Located charm "cs:ubuntu-10".
Deploying charm "cs:ubuntu-10".
 ~ $ juju status
Model Controller Cloud/Region Version
default snowspeeder snowspeeder/snowspeeder 2.1.3

App Version Status Scale Charm Store Rev OS Notes
ubuntu 16.04 active 20 ubuntu jujucharms 10 ubuntu

Unit Workload Agent Machine Public address Ports Message
ubuntu/0* active idle 0 192.168.10.15 ready
ubuntu/1 active idle 1 192.168.10.45 ready
[...]
 ~ $ juju create-backup --filename test-juju-2.1.3.tar.gz
ERROR backups are not supported for hosted models
 ~ $ juju create-backup -m controller --filename test-juju-2.1.3.tar.gz
20170601-130949.ed86d806-95f9-48a8-842b-0045de908f67
downloading to test-juju-2.1.3.tar.gz
 ~ $ ls -lh test-juju-2.1.3.tar.gz
-rw-r--r-- 1 freyes freyes 49M Jun 1 09:10 test-juju-2.1.3.tar.gz
 ~ $ juj^C
 ~ $ sudo snap refresh --classic --beta juju
[sudo] password for freyes:
juju (beta) 2.2-beta4 from 'canonical' refreshed
 ~ $ juju --version
2.2-beta4-zesty-amd64
 ~ $ juju model-config -m controller agent-stream=devel
 ~ $ juju sync-tools
^C
 ~ $ juju model-config -m controller logging-config
<root>=WARNING;unit=DEBUG
 ~ $ juju model-config -m controller logging-config="<root>=DEBUG;unit=DEBUG"
 ~ $ juju sync-tools --debug --verbose
 09:21:37 INFO juju.cmd supercommand.go:63 running juju [2.2-beta4 gc go1.8]
09:21:37 DEBUG juju.cmd supercommand.go:64 args: []string{"/snap/juju/1837/bin/juju", "sync-tools", "--debug", "--verbose"}
09:21:37 INFO juju.juju api.go:73 connecting to API addresses: [192.168.10.16:17070]
09:21:37 DEBUG juju.api apiclient.go:683 dialing "wss://192.168.10.16:17070/model/ab1d7a76-c9e2-46e7-886f-8b77c86d57ea/api"
09:21:37 DEBUG juju.api apiclien...

Read more...

Revision history for this message
Ian Booth (wallyworld) wrote :

The issue here is that the system was bootstrapped with 2.1.3 and so didn't initially have some new controller settings introduced in 2.2 beta3/4. Upgrading Juju does add those new settings to the Juju database in mongo, but there's a bootstrap-config.yaml file used by restore which was not updated.

We'll do a quick fix for the 2.2 release and address the root cause for 2.3.

Changed in juju:
milestone: none → 2.2-rc1
importance: Undecided → High
status: New → In Progress
assignee: nobody → Christian Muirhead (2-xtian)
Ian Booth (wallyworld)
Changed in juju:
assignee: Christian Muirhead (2-xtian) → Ian Booth (wallyworld)
Revision history for this message
Ian Booth (wallyworld) wrote :
Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.