relation data lost during upgrade to juju 2.8.1

Bug #1890828 reported by Laurent Sesquès
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Heather Lanigan

Bug Description

Hi,

I recently upgraded a juju controller from 2.7.6 to 2.8.1.
It completed successfully and it was reporting being 2.8.1.
I then proceeded to upgrade a model managed by this controller.
The upgrade didn't go well, as a service lost relation data.

`juju run --unit application/0 "relation-ids"` still showed the relation ("shared-db:29").
But `juju run --unit application/0 "relation-list -r shared-db:29"` gave an empty output.

This could only be solved by re-running the shared-db-relation-changed hooks, and some of the relation values were re-initialized.

I'd rather not attach the full controller logs, but we can provide excerpts or communicate the log files directly on a non-public server.

Tags: upgrade-juju
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

I've been so far unable to reproduce with neither 2.7.6 nor 2.7.8. Perhaps 10 units is insufficient.

Is it possible to get the output of juju status, or juju export-bundle for the model in question?

We can make this bug a private one, if security around responses and data is a concern.

Pen Gale (pengale)
tags: added: upgrade-juju
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

Is it also possible to get the logs for the unit in question which failed, and it's machine?

Pen Gale (pengale)
Changed in juju:
status: New → Incomplete
Revision history for this message
Junien F (axino) wrote :

Logs provided OOB.

Changed in juju:
status: Incomplete → New
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

I believe I've reproduced this now.

Changed in juju:
status: New → In Progress
assignee: nobody → Heather Lanigan (hmlanigan)
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

Our working hypothesis is that this is caused by a race during upgrade.

The machine agent is moving state for every unit from the machine to the controller. However at the same time the units are restarting after upgrade and expecting their state to be on the controller. On machines with several units, the move could take long enough, so that the unit's state is no on the controller yet.

The unit agent starts with a fresh state, like it's just installing, if no data from the controller is available. This could impact relations, hook execution knowledge or storage.

Changed in juju:
importance: Undecided → High
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

Unit agents do not start upgrade until after the machine agent is finished.

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

Here is a change to remove the race condition between the time the machine agent upgrades and the unit agent upgrades. https://github.com/juju/juju/pull/11897

Reproducing this issue, brought down my system. I don't have a reproducer smaller enough to verify that this fixes the problem. That can be done with the candidate.

Changed in juju:
milestone: none → 2.8.2
status: In Progress → Fix Committed
Revision history for this message
Junien F (axino) wrote :

@hmlanigan : Thanks ! When is juju 2.8.2 expected to be released ?

Revision history for this message
Heather Lanigan (hmlanigan) wrote : Re: [Bug 1890828] Re: relation data lost during upgrade to juju 2.8.1

We're trying to get 2.8.2 out the door now, there are a few final bugs to
squash. Hoping next week.

On Thu, Aug 13, 2020 at 9:01 AM Junien Fridrick <email address hidden>
wrote:

> @hmlanigan : Thanks ! When is juju 2.8.2 expected to be released ?
>
> --
> You received this bug notification because you are a bug assignee.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1890828
>
> Title:
> relation data lost during upgrade to juju 2.8.1
>
> Status in juju:
> Fix Committed
>
> Bug description:
> Hi,
>
> I recently upgraded a juju controller from 2.7.6 to 2.8.1.
> It completed successfully and it was reporting being 2.8.1.
> I then proceeded to upgrade a model managed by this controller.
> The upgrade didn't go well, as a service lost relation data.
>
> `juju run --unit application/0 "relation-ids"` still showed the relation
> ("shared-db:29").
> But `juju run --unit application/0 "relation-list -r shared-db:29"` gave
> an empty output.
>
> This could only be solved by re-running the shared-db-relation-changed
> hooks, and some of the relation values were re-initialized.
>
> I'd rather not attach the full controller logs, but we can provide
> excerpts or communicate the log files directly on a non-public server.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1890828/+subscriptions
>

Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.