(2.5-beta1) juju upgrade-series from Trusty to Xenial hangs up

Bug #1804701 reported by Xav Paice
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Joseph Phillips

Bug Description

I have an environment with Maas 2.3.5, Juju 2.5-beta1, with Trusty machines. Trying to run an upgrade from Trusty to Xenial.

Machine 14 is a KVM instance, no LXCs.

juju upgrade-series 14 prepare xenial
WARNING: This command will mark machine "14" as being upgraded to series "xenial".
This operation cannot be reverted or canceled once started.
Units running on the machine will also be upgraded. These units include:
  grafana/2
  filebeat/38
  nrpe-physical/0

Leadership for the following applications will be pinned and not
subject to change until the "complete" command is run:
  filebeat
  grafana
  nrpe-physical

Continue [y/N]?y
machine-14 started upgrade series from "trusty" to "xenial"
grafana/2 pre-series-upgrade hook running
grafana/2 pre-series-upgrade completed
nrpe-physical/0 pre-series-upgrade hook running
nrpe-physical/0 pre-series-upgrade hook not found, skipping
filebeat/38 pre-series-upgrade hook running
filebeat/38 pre-series-upgrade completed

<hangs here>

I had expected the command to return at some stage, it's been sitting for 45 mins with no response.

Machine logs on machine 14:
2018-11-22 20:50:43 INFO juju.worker.upgradeseries worker.go:171 machine series upgrade status is "prepare started"
2018-11-22 20:50:43 INFO juju.worker.upgradeseries worker.go:361 unpin leader for application "filebeat"
2018-11-22 20:50:43 INFO juju.worker.upgradeseries worker.go:361 unpin leader for application "grafana"
2018-11-22 20:50:43 INFO juju.worker.upgradeseries worker.go:361 unpin leader for application "nrpe-physical"
2018-11-22 20:50:43 INFO juju.worker.upgradeseries worker.go:171 machine series upgrade status is "prepare started"
2018-11-22 20:50:43 INFO juju.worker.upgradeseries worker.go:171 machine series upgrade status is "prepare started"
2018-11-22 20:50:50 INFO juju.worker.upgradeseries worker.go:171 machine series upgrade status is "prepare started"
2018-11-22 20:50:50 INFO juju.worker.upgradeseries worker.go:171 machine series upgrade status is "prepare started"
2018-11-22 20:50:50 INFO juju.worker.upgradeseries worker.go:171 machine series upgrade status is "prepare started"
2018-11-22 20:50:50 INFO juju.worker.upgradeseries worker.go:171 machine series upgrade status is "prepare started"
2018-11-22 20:50:50 INFO juju.worker.upgradeseries worker.go:171 machine series upgrade status is "prepare started"
2018-11-22 20:50:50 INFO juju.worker.upgradeseries worker.go:171 machine series upgrade status is "prepare started"
2018-11-22 20:50:50 INFO juju.worker.upgradeseries worker.go:171 machine series upgrade status is "prepare started"
2018-11-22 20:50:51 INFO juju.worker.upgradeseries worker.go:171 machine series upgrade status is "prepare started"
2018-11-22 20:50:54 INFO juju.worker.upgradeseries worker.go:171 machine series upgrade status is "prepare started"
2018-11-22 20:50:54 INFO juju.worker.upgradeseries worker.go:217 preparing service units for series upgrade
2018-11-22 20:50:54 ERROR juju.service.systemd service.go:155 dbus link request failed for application "jujud-unit-grafana-2": No such method 'LinkUnitFiles'
2018-11-22 20:50:54 INFO juju.service agentconf.go:196 failed to write service for unit-grafana-2: dbus link request failed for application "jujud-unit-grafana-2": No such method 'LinkUnitFiles'
2018-11-22 20:50:54 ERROR juju.service.systemd service.go:155 dbus link request failed for application "jujud-unit-nrpe-physical-0": No such method 'LinkUnitFiles'
2018-11-22 20:50:54 INFO juju.service agentconf.go:196 failed to write service for unit-nrpe-physical-0: dbus link request failed for application "jujud-unit-nrpe-physical-0": No such method 'LinkUnitFiles'
2018-11-22 20:50:54 ERROR juju.service.systemd service.go:155 dbus link request failed for application "jujud-unit-filebeat-38": No such method 'LinkUnitFiles'
2018-11-22 20:50:54 INFO juju.service agentconf.go:196 failed to write service for unit-filebeat-38: dbus link request failed for application "jujud-unit-filebeat-38": No such method 'LinkUnitFiles'
2018-11-22 20:50:54 ERROR juju.service.systemd service.go:155 dbus link request failed for application "jujud-machine-14": No such method 'LinkUnitFiles'
2018-11-22 20:50:54 INFO juju.service agentconf.go:196 failed to write service for machine-14: dbus link request failed for application "jujud-machine-14": No such method 'LinkUnitFiles'
2018-11-22 20:50:54 ERROR juju.worker.dependency engine.go:632 "upgrade-series" manifold worker returned unexpected error: failed to write agents: unit-grafana-2, unit-nrpe-physical-0, unit-filebeat-38, machine-14: dbus link request failed for application "jujud-machine-14": No such method 'LinkUnitFiles'

The unit agents report INFO juju.worker.uniter.operation runhook.go:138 ran "pre-series-upgrade" hook (or skipped if it's missing).

Revision history for this message
Xav Paice (xavpaice) wrote :
Download full text (5.8 KiB)

Further info, if I ctrl-c the prepare, and continue with the upgrade, when I try to complete:

$ juju upgrade-series --debug 14 complete
22:41:06 INFO juju.cmd supercommand.go:56 running juju [2.5-beta1 gc go1.10.4]
22:41:06 DEBUG juju.cmd supercommand.go:57 args: []string{"/snap/juju/5875/bin/juju", "upgrade-series", "--debug", "14", "complete"}
22:41:06 INFO juju.juju api.go:67 connecting to API addresses: [10.245.88.10:17070]
22:41:06 DEBUG juju.api apiclient.go:883 successfully dialed "wss://10.245.88.10:17070/model/12ce57e9-789d-45c8-84ee-0b182d95b47b/api"
22:41:06 INFO juju.api apiclient.go:603 connection established to "wss://10.245.88.10:17070/model/12ce57e9-789d-45c8-84ee-0b182d95b47b/api"
22:41:06 INFO cmd upgradeseries.go:330 machine-14 binaries and service files written
machine-14 complete phase started
22:41:07 INFO cmd upgradeseries.go:330 machine-14 started unit agents after series upgrade
22:41:07 INFO cmd upgradeseries.go:330 grafana/2 post-series-upgrade hook running
22:41:45 INFO cmd upgradeseries.go:330 grafana/2 post-series-upgrade completed
22:41:45 INFO cmd upgradeseries.go:330 filebeat/38 post-series-upgrade hook running
22:41:53 INFO cmd upgradeseries.go:330 nrpe-physical/0 post-series-upgrade hook running
22:41:53 INFO cmd upgradeseries.go:330 nrpe-physical/0 post-series-upgrade hook not found, skipping

<sits>

machine log:
2018-11-22 22:33:35 INFO juju.cmd supercommand.go:56 running jujud [2.5-beta1 gc go1.11.2]
2018-11-22 22:33:35 DEBUG juju.cmd supercommand.go:57 args: []string{"/var/lib/juju/tools/machine-14/jujud", "machine", "--data-dir", "/var/lib/juju", "--machine-id", "14", "--debug"}
2018-11-22 22:33:35 DEBUG juju.utils gomaxprocs.go:24 setting GOMAXPROCS to 8
2018-11-22 22:33:35 DEBUG juju.agent agent.go:545 read agent config, format "2.0"
2018-11-22 22:33:35 INFO juju.cmd.jujud agent.go:133 setting logging config to "<root>=INFO;unit=DEBUG"
2018-11-22 22:33:35 INFO juju.worker.upgradesteps worker.go:74 upgrade steps for 2.5-beta1 have already been run.
2018-11-22 22:33:35 INFO juju.api apiclient.go:603 connection established to "wss://10.245.88.10:17070/model/12ce57e9-789d-45c8-84ee-0b182d95b47b/api"
2018-11-22 22:33:36 INFO juju.worker.upgrader upgrader.go:155 abort check blocked until version event received
2018-11-22 22:33:36 INFO juju.worker.upgrader upgrader.go:161 unblocking abort check
2018-11-22 22:33:36 INFO juju.worker.upgrader upgrader.go:194 desired agent binary version: 2.5-beta1
2018-11-22 22:33:37 INFO juju.worker.migrationminion worker.go:116 migration phase is now: NONE
2018-11-22 22:33:37 INFO juju.worker.upgradeseries worker.go:171 machine series upgrade status is "prepare completed"
2018-11-22 22:33:37 INFO juju.worker.diskmanager diskmanager.go:67 block devices changed: [{sda [/dev/disk/by-id/ata-QEMU_HARDDISK_QM00001 /dev/disk/by-path/pci-0000:00:01.1-ata-1] ata-QEMU_HARDDISK_QM00001 102400 true } {sda1 [/dev/disk/by-id/ata-QEMU_HARDDISK_QM00001-part1 /dev/disk/by-label/root /dev/disk/by-path/pci-0000:00:01.1-ata-1-part1 /dev/disk/by-uuid...

Read more...

Revision history for this message
Xav Paice (xavpaice) wrote :

On a machine with LXDs as well, the upgrade prepare completed OK but the complete didn't, with this in the machine log:

2018-11-23 03:28:43 INFO juju.packaging.manager utils.go:105 Retrying: apt-get --option=Dpkg::Options::=--force-confold --option=Dpkg::options::=--force-unsafe-io --assume-yes --quiet install --target-release trusty-backports lxd
2018-11-23 03:28:53 INFO juju.packaging.manager utils.go:105 Retrying: apt-get --option=Dpkg::Options::=--force-confold --option=Dpkg::options::=--force-unsafe-io --assume-yes --quiet install --target-release trusty-backports lxd
2018-11-23 03:28:53 ERROR juju.packaging.manager utils.go:109 packaging command failed: exit status 100; cmd: "apt-get --option=Dpkg::Options::=--force-confold --option=Dpkg::options::=--force-unsafe-io --assume-yes --quiet install --target-release trusty-backports lxd"; output: Reading package lists...
E: The value 'trusty-backports' is invalid for APT::Default-Release as such a release is not available in the sources

2018-11-23 03:28:53 WARNING juju.provisioner container_initialisation.go:145 not stopping machine agent container watcher due to error: setting up container dependencies on host machine: packaging command failed: exit status 100
2018-11-23 03:28:53 ERROR juju.provisioner container_initialisation.go:124 starting container provisioner for lxd: setting up container dependencies on host machine: packaging command failed: exit status 100
2018-11-23 03:28:56 INFO juju.provisioner container_initialisation.go:116 initial container setup with ids: [0/lxd/14 0/lxd/2 0/lxd/4 0/lxd/1 0/lxd/10 0/lxd/11 0/lxd/5 0/lxd/6 0/lxd/7 0/lxd/9 0/lxd/0 0/lxd/12 0/lxd/3]

Revision history for this message
Joseph Phillips (manadart) wrote :

For the KVM machine, can you provide the series of its host?

Regarding the LXD issue, can you provide the host series too, and also whether LXD was installed by alternate means such us the Snap?

Changed in juju:
assignee: nobody → Joseph Phillips (manadart)
importance: Undecided → High
milestone: none → 2.5-beta2
status: New → Triaged
Revision history for this message
Xav Paice (xavpaice) wrote :

For the KVM machine, the host is Trusty.

For the LXD, the host was Trusty, got upgraded to Xenial, then I upgraded the LXD on it.

LXD is/was installed from the deb, it's currently version 2.0.11-0ubuntu1~16.04.4

The host machines were originally deployed by Juju using 2.4.6 and Maas 2.3.5, and unmodified.

Changed in juju:
status: Triaged → In Progress
Revision history for this message
Joseph Phillips (manadart) wrote :

Thanks for the extra info. I can see what is going on here.

The KVM issue:

What is the source of this image? What should happen is that in the absence of the path "/run/systemd/system", the upgrade steps perform manual creation and linking of service unit files for each agent. But (I think) the path seems to be present, resulting in calls to the DBUS API of a method that is not available.

In any case, this scenario will be better handled.

The LXD issue:

This is a race with the container provisioner worker starting before upgrade-series worker has set the new series and crashing the agent.

Again, the fix is known and forthcoming.

Revision history for this message
Joseph Phillips (manadart) wrote :

LXD issue will be rectified when this one lands:
https://github.com/juju/juju/pull/9505

Changed in juju:
importance: High → Critical
Revision history for this message
Joseph Phillips (manadart) wrote :

Issue observed with the KVM container is addressed here:
https://github.com/juju/juju/pull/9512

Changed in juju:
status: In Progress → Fix Committed
Ian Booth (wallyworld)
Changed in juju:
milestone: 2.5-beta2 → 2.5-beta3
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.