Series upgrade doesn't start unit agents

Bug #1749201 reported by Chris MacNaughton on 2018-02-13
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
juju
High
Heather Lanigan

Bug Description

2018-02-13 14:35:11 DEBUG juju.worker.dependency engine.go:486 "unit-agent-deployer" manifold worker started
2018-02-13 14:35:11 DEBUG juju.service discovery.go:63 discovered init system "systemd" from series "xenial"
2018-02-13 14:35:12 INFO juju.worker.deployer deployer.go:115 checking unit "ubuntu/0"
2018-02-13 14:35:12 INFO juju.worker.deployer deployer.go:158 deploying unit "ubuntu/0"
2018-02-13 14:35:12 DEBUG juju.service discovery.go:111 failed to find init system "upstart": exec "/sbin/initctl" failed: exit status 1
2018-02-13 14:35:12 DEBUG juju.service discovery.go:115 discovered init system "systemd" from local host
2018-02-13 14:35:12 DEBUG juju.worker.dependency engine.go:504 "unit-agent-deployer" manifold worker stopped: cannot read agent metadata in directory /var/lib/juju/tools/2.3.2-xenial-amd64: open /var/lib/juju/tools/2.3.2-xenial-amd64/downloaded-tools.txt: no such file or directory
2018-02-13 14:35:12 ERROR juju.worker.dependency engine.go:551 "unit-agent-deployer" manifold worker returned unexpected error: cannot read agent metadata in directory /var/lib/juju/tools/2.3.2-xenial-amd64: open /var/lib/juju/tools/2.3.2-xenial-amd64/downloaded-tools.txt: no such file or directory
2018-02-13 14:35:12 DEBUG juju.worker.dependency engine.go:553 stack trace:
cannot read agent metadata in directory /var/lib/juju/tools/2.3.2-xenial-amd64: open /var/lib/juju/tools/2.3.2-xenial-amd64/downloaded-tools.txt: no such file or directory
github.com/juju/juju/worker/deployer/simple.go:127:

On the unit in question, we can see the trusty tools but not xenial ones:

ubuntu@juju-cf2ccb-0:~$ ls /var/lib/juju/tools/2.3.2-trusty-amd64/downloaded-tools.txt
/var/lib/juju/tools/2.3.2-trusty-amd64/downloaded-tools.txt
ubuntu@juju-cf2ccb-0:~$ ls /var/lib/juju/tools/2.3.2-xenial-amd64/downloaded-tools.txt
ls: cannot access '/var/lib/juju/tools/2.3.2-xenial-amd64/downloaded-tools.txt': No such file or directory

Ryan Beisner (1chb1n) on 2018-02-13
tags: added: uosci
Heather Lanigan (hmlanigan) wrote :

@chris.macnaughton, did you try a reboot?

This is a doc bug, because the instructions (https://jujucharms.com/docs/stable/howto-updateseries) were incorrect. There is no expectation that the 'juju update-series' command will make changes on existing units itself.

Changed in juju:
status: New → Triaged
assignee: nobody → Heather Lanigan (hmlanigan)

I did reboot, and sent the update-series to juju with the result of running `lsb_release -b -s` on the unit in question; the unit _is_ running xenial, and the machine agent is starting, but the agent seems to look for /var/lig/juju/tools/$VERSION-xenial-$ARCH but that isn't being created

https://code.launchpad.net/~chris.macnaughton/openstack-mojo-specs/upgrade-spec/+merge/337646 is a WIP set of helpers to iterate through upgrading all of the deployed units. As you can see, it looks a lot like the linked documentation, including reboots and, if coming from Trusty, updating to Systemd. At the end of that, the machine agent is back up, the machine shows Xenial, but no application agents come back up

All of my testing so far has been on a LXD deployed model

Changed in juju:
importance: Undecided → High
Changed in juju:
status: Triaged → In Progress
Heather Lanigan (hmlanigan) wrote :

I was able to reproduce what you're seeing and move forward by running the following command on the unit:

sudo ln -s /var/lib/juju/tools/2.3.2-trusty-amd64 /var/lib/juju/tools/2.3.2-xential-amd64

Ryan Beisner (1chb1n) wrote :

We're working to test and automated the documented procedure. Will this step be added to the procedure, or is there a fix underway in juju core?

Heather Lanigan (hmlanigan) wrote :

There is a fix underway within juju core.

I have found another issue to be investigated once this one is out of the way. The unit gets stuck with a "agent installing status", though nothing appears to be wrong.

One item which will get updated in the docs is to remove the file:
/var/lib/juju/agents/unit*/charm/wheelhouse/.bootstrapped
This will trigger the charm to redo the pip installs, necessary currently when the python version changes from series to series.

Changed in juju:
milestone: none → 2.3.4
Heather Lanigan (hmlanigan) wrote :

Caveat with this fix: You must wait for the unit agent to be up and running before running the juju update-series <machine number> <series> command.

Changed in juju:
milestone: 2.3.4 → 2.3.5
Xav Paice (xavpaice) on 2018-02-28
tags: added: canonical-bootstack
John A Meinel (jameinel) on 2018-03-26
Changed in juju:
milestone: 2.3.5 → 2.3.6
Heather Lanigan (hmlanigan) wrote :

A new approach has been implemented, PR 8412 will be removed.

For develop: https://github.com/juju/juju/pull/8552

Heather Lanigan (hmlanigan) wrote :

juju-updateseries can be run on the different units will take the following flags
--to-series <series> <- required
--from-series <series> <- required
--data-dir
--start-agents <- only available if a reboot after series upgrade was performed.

Caveats:
* Cannot be used on a controller. To upgrade a controller series, bootstrap a new controller and migrate existing models.
* Individual charms will require testing, (there is a known issue with reactive charms)
* no support for windows

If moving from trusty to xenial, juju agents will be rewritten for systemd, enable and linked.

In general, tools will be copied to an appropriately named directory and agent tool symlinks will be updated appropriately.

Junien Fridrick (axino) wrote :

Just out of curiosity, why can't this be used on a controller ?

The mongo disk format. There is a different mongo version used in trusty
and xenial. Dealing with half upgraded replicas of different versions is
not something we want to tackle.

On 03/04/18 17:31, Junien Fridrick wrote:
> Just out of curiosity, why can't this be used on a controller ?
>

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
Felipe Reyes (freyes) wrote :

I keep seeing this error in a freshly bootstrapped 2.3.6 environment

2018-04-20 20:51:46 ERROR juju.worker.dependency engine.go:551 "unit-agent-deployer" manifold worker returned unexpected error: cannot read agent metadata in directory /var/lib/juju/tools/2.3.6-xenial-amd64: open /var/lib/juju/tools/2.3.6-xenial-amd64/downloaded-tools.txt: no such file or directory

They way I'm upgrading the machine is this -> https://gist.github.com/freyes/5327cc554bd18d4f9f6059a9a1af153b

Full machine-5.log http://paste.ubuntu.com/p/qbjZPWr9Wt/

$ juju status -m controller
Model Controller Cloud/Region Version SLA
controller c2 stsstack/stsstack 2.3.6 unsupported

App Version Status Scale Charm Store Rev OS Notes

Unit Workload Agent Machine Public address Ports Message

Machine State DNS Inst id Series AZ Message
0 started 10.5.0.8 4e9c59cf-77bc-4525-a9dc-ab9b7345f612 xenial nova ACTIVE

$ snap info juju
name: juju
summary: juju client
publisher: canonical
contact: http://jujucharms.com
license: unknown
description: |
  Through the use of charms, juju provides you with shareable, re-usable, and repeatable expressions of
  devops best practices.
commands:
  - juju
snap-id: e2CPHpB1fUxcKtCyJTsm5t3hN9axJ0yj
tracking: stable
refreshed: 2018-04-18T13:14:30-03:00
installed: 2.3.6 (4108) 54MB classic
...
$ juju --version
2.3.6-bionic-amd64

Felipe Reyes (freyes) wrote :

This is the output of my upgrade-series.sh script https://pastebin.ubuntu.com/p/VB3Zyw3HCy/

tags: added: sts
John A Meinel (jameinel) wrote :

We expect to have a file "downloaded-tools.txt" that contains a JSON blob
containing
{
 "version": "2.3.6-xenial-amd64",
 "URL": "http://path/to/download/location",
 "SHA256": "abcdefgh", # hash of the tools.tar.gz file
 "Size": 1234 # integer length of tools.tar.gz
}

Note that 2.3.6 has a bad upgrade step, so we're removing it as a target
and trying to release a 2.3.7 today.
But that doesn't specifically affect what you're seeing here. (fresh 2.3.6
is fine, and upgrading series goes via a different path).

It may be that "juju update-series" should do that, though it seems like it
could be crafted by hand as well.
Given there was one in trusty, its possible it could just be copied across.

On Sat, Apr 21, 2018 at 12:54 AM, Felipe Reyes <email address hidden>
wrote:

> I keep seeing this error in a freshly bootstrapped 2.3.6 environment
>
> 2018-04-20 20:51:46 ERROR juju.worker.dependency engine.go:551 "unit-
> agent-deployer" manifold worker returned unexpected error: cannot read
> agent metadata in directory /var/lib/juju/tools/2.3.6-xenial-amd64: open
> /var/lib/juju/tools/2.3.6-xenial-amd64/downloaded-tools.txt: no such
> file or directory
>
>
> They way I'm upgrading the machine is this -> https://gist.github.com/
> freyes/5327cc554bd18d4f9f6059a9a1af153b
>
> Full machine-5.log http://paste.ubuntu.com/p/qbjZPWr9Wt/
>
> $ juju status -m controller
> Model Controller Cloud/Region Version SLA
> controller c2 stsstack/stsstack 2.3.6 unsupported
>
> App Version Status Scale Charm Store Rev OS Notes
>
> Unit Workload Agent Machine Public address Ports Message
>
> Machine State DNS Inst id Series
> AZ Message
> 0 started 10.5.0.8 4e9c59cf-77bc-4525-a9dc-ab9b7345f612 xenial
> nova ACTIVE
>
> $ snap info juju
> name: juju
> summary: juju client
> publisher: canonical
> contact: http://jujucharms.com
> license: unknown
> description: |
> Through the use of charms, juju provides you with shareable, re-usable,
> and repeatable expressions of
> devops best practices.
> commands:
> - juju
> snap-id: e2CPHpB1fUxcKtCyJTsm5t3hN9axJ0yj
> tracking: stable
> refreshed: 2018-04-18T13:14:30-03:00
> installed: 2.3.6 (4108) 54MB classic
> ...
> $ juju --version
> 2.3.6-bionic-amd64
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1749201
>
> Title:
> Series upgrade doesn't start unit agents
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1749201/+subscriptions
>

Felipe Reyes (freyes) wrote :

On Mon, Apr 23, 2018 at 09:00:50AM -0000, John A Meinel wrote:
> We expect to have a file "downloaded-tools.txt" that contains a JSON blob
> containing
> {
> "version": "2.3.6-xenial-amd64",
> "URL": "http://path/to/download/location",
> "SHA256": "abcdefgh", # hash of the tools.tar.gz file
> "Size": 1234 # integer length of tools.tar.gz
> }
>
> Note that 2.3.6 has a bad upgrade step, so we're removing it as a target
> and trying to release a 2.3.7 today.
> But that doesn't specifically affect what you're seeing here. (fresh 2.3.6
> is fine, and upgrading series goes via a different path).
>
> It may be that "juju update-series" should do that, though it seems like it
> could be crafted by hand as well.
> Given there was one in trusty, its possible it could just be copied across.

the workaround I'm using is:

cd /var/lib/juju/tools
ln -s ln -s 2.3.6-trusty-amd64 2.3.6-xenial-amd64

It works fine, maybe I should "cp -rfp" instead of using a symlink.

Is this expected to be made by the user? or by juju?, so we can decide if it's a bug in the docs or in juju.

Heather Lanigan (hmlanigan) wrote :

@freyes,

We've significantly changed the documentation about how to upgrade the series of an existing unit, after adding the juju update-series command. From line 56 down in your script, is down done by the command.

I just looked at the docs, and my changes aren't there. I'll track down what happened to them.

While that's happening: after the do-release-upgrade step, you run the command described in #11 on each unit instead of the script. Then do the last step of juju update-series <machine-#> xenial

From the script, a step is missing... to replace the links for the machine and unit tools in /var/lib/juju/tools to point to /var/lib/juju/tools/2.3.6-xenial-amd64

@hmlanigan I'm testing with the new juju-updateseries command but I'm running into an issue: when I run juju-updateseries (without `--start-agents`) before rebooting into the new distribution, the agents don't seem to start on boot; is it required to run after the reboot and with `--start-agents`?

ubuntu@juju-76e842-icey-0:~$ sudo systemctl status jujud*
sudo: unable to resolve host juju-76e842-icey-0
ubuntu@juju-76e842-icey-0:~$

The agents don't seem to be registered

Heather Lanigan (hmlanigan) wrote :

@chris.macnaughton,

There are two ways to use the command:
1. do-release-upgrade, reboot, juju-updateseries --start-agents
2. do-release-upgrade, juju-updateseries, reboot

Systemd is not really running until the reboot occurs, so starting the agents won't work. Either way is fine, just how you think would be best.

If you run after reboot, but don't use --start-agents, the agents won't be started.

regarding #19, do you have any logs you can show me? i haven't seen that behavior. Are the jujud agent files linked in /etc/systemd/system/multi-user.target.wants/ Can you start them by hand in this case?

regarding #20, is this before or after reboot? I have found that sometimes systemctl status jujud* doesn't give results, which is weird, but systemctl status jujud-machine-0 works... not sure why

Ryan Beisner (1chb1n) on 2018-04-30
tags: added: series-upgrade
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers