upgrade to 2.3.6 failed: the dotted field is not valid for storage

Bug #1765722 reported by Doug Parrish
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
John A Meinel
2.3
Fix Released
Critical
John A Meinel

Bug Description

upgrade-juju of controller model, from 2.3.5 to 2.3.6 fails. From controller model machine-0's log:

2018-04-19 22:49:15 ERROR juju.worker.upgradesteps worker.go:379 upgrade from 2.3.5 to 2.3.6 for "machine-0" failed (will retry): ensure container-image-stream config defaults to released: The dotted field 'mysql.passwd' in 'settings.mysql.passwd' is not valid for storage.

MongoDB apparently doesn't like keys with dots.

Felipe Reyes (freyes)
tags: added: sts
Revision history for this message
Felipe Reyes (freyes) wrote :

Here is a reproducer:

 ~ $ juju bootstrap --bootstrap-series xenial --agent-version 2.3.5 lxd lxd
Creating Juju controller "lxd" on lxd/localhost
Looking for packaged Juju agent version 2.3.5 for amd64
To configure your system to better support LXD containers, please see: https://github.com/lxc/lxd/blob/master/doc/production-setup.md
Launching controller instance(s) on localhost/localhost...
 - juju-00402b-0 (arch=amd64)
Installing Juju agent on bootstrap instance
Fetching Juju GUI 2.12.1
Waiting for address
Attempting to connect to 10.0.4.173:22
Connected to 10.0.4.173
Running machine configuration script...
Bootstrap agent now started
Contacting Juju controller at 10.0.4.173 to verify accessibility...
Bootstrap complete, "lxd" controller now available
Controller machines are in the "controller" model
Initial model "default" added
 ~ $ juju deploy cs:percona-cluster
Located charm "cs:percona-cluster-261".
Deploying charm "cs:percona-cluster-261".
 ~ $ juju run --unit percona-cluster/0 leader-get
leader-ip: 10.0.4.37
mysql.passwd: g6yHsw3fJysLKfwkVpxxY2CBKS7mWFsSsLdKGzb5
root-password: g6yHsw3fJysLKfwkVpxxY2CBKS7mWFsSsLdKGzb5
sst-password: JwMGKp7ysm9963xGWMt9zrHt2psNKzBX2ykc3CZcg
 ~ $ juju model-config -m controller logging-config="<root>=DEBUG;unit=DEBUG"
 ~ $ juju model-config logging-config="<root>=DEBUG;unit=DEBUG"
 ~ $ juju upgrade-juju -m controller --agent-version 2.3.6
best version:
    2.3.6
started upgrade to 2.3.6
 ~ $ juju upgrade-juju --agent-version 2.3.6
ERROR upgrade in progress (upgrade in progress)
 ~ $ juju status -m controller
Model Controller Cloud/Region Version Notes SLA
controller lxd localhost/localhost 2.3.6 upgrade in progress since "2018-04-20T14:08:20Z" unsupported

App Version Status Scale Charm Store Rev OS Notes

Unit Workload Agent Machine Public address Ports Message

Machine State DNS Inst id Series AZ Message
0 error 10.0.4.173 juju-00402b-0 xenial Running

Error in machine-0.log

2018-04-20 14:08:21 ERROR juju.upgrade upgrade.go:140 upgrade step "ensure container-image-stream config defaults to released" failed: The dotted field 'mysql.passwd' in 'settings.mysql.passwd' is not valid for storage.
2018-04-20 14:08:21 ERROR juju.worker.upgradesteps worker.go:379 upgrade from 2.3.5 to 2.3.6 for "machine-0" failed (will retry): ensure container-image-stream config defaults to released: The dotted field 'mysql.passwd' in 'settings.mysql.passwd' is not valid for storage.

Full machine-0.log -> http://paste.ubuntu.com/p/GzGfyMtsv6/

At the moment any juju controller that hosts a model with percona-cluster (the default for OpenStack installations) will get into this broken state when upgrading to 2.3.6

tags: added: openstack
tags: added: docteam
Changed in juju:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 2.3.7
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1765722] Re: upgrade to 2.3.6 failed: the dotted field is not valid for storage

we've never liked dotted fields. Is this just a case where bad data made it
in at some point in the past and we never excised it?

John
=:->

On Fri, Apr 20, 2018, 18:56 Nicholas Skaggs <email address hidden>
wrote:

> ** Changed in: juju
> Status: New => Triaged
>
> ** Changed in: juju
> Importance: Undecided => Critical
>
> ** Changed in: juju
> Milestone: None => 2.3.7
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1765722
>
> Title:
> upgrade to 2.3.6 failed: the dotted field is not valid for storage
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1765722/+subscriptions
>

Revision history for this message
Felipe Reyes (freyes) wrote :

john, the percona-cluster charm has been storing the password in the "mysql.passwd" key for a long time, I could find mentions of this key in relation-set/leader-set related code in 2015 -> https://github.com/openstack/charm-percona-cluster/commit/5e4fdcb35a1f28e1f25cbbcce2f90d2d5875b3da

I've never seen the juju tools complain about dots in keys

Revision history for this message
Doug Parrish (dparrish) wrote :

Comparison of percona-cluster leader settings in MongoDB (settings collection) for healthy (2.2.2) and one suffering this bug: https://paste.ubuntu.com/p/xxRKn4ncfd/

Revision history for this message
John A Meinel (jameinel) wrote :

IIRC to get around mongo not liking '.' we actually escape it to "wide
Unicode ." which is why you see the extra space. My guess is something in
upgrade might be decoding the "." but failing to re-encode it.

John
=:->

On Fri, Apr 20, 2018, 20:05 Doug Parrish <email address hidden> wrote:

> Comparison of percona-cluster leader settings in MongoDB (settings
> collection) for healthy (2.2.2) and one suffering this bug:
> https://paste.ubuntu.com/p/xxRKn4ncfd/
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1765722
>
> Title:
> upgrade to 2.3.6 failed: the dotted field is not valid for storage
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1765722/+subscriptions
>

John A Meinel (jameinel)
Changed in juju:
assignee: nobody → John A Meinel (jameinel)
status: Triaged → In Progress
Revision history for this message
John A Meinel (jameinel) wrote :

Ran into this as:
2018-04-22 10:55:16 ERROR juju.upgrade upgrade.go:140 upgrade step "ensure container-image-stream config defaults to released" failed: The dotted field 'zfs.pool_name' in 'settings.zfs.pool_name' is not valid for storage.
2018-04-22 10:55:16 ERROR juju.worker.upgradesteps worker.go:379 upgrade from 2.3.5 to 2.3.6 for "machine-0" failed (will retry): ensure container-image-stream config defaults to released: The dotted field 'zfs.pool_name' in 'settings.zfs.pool_name' is not valid for storage.

Revision history for this message
John A Meinel (jameinel) wrote :

The issue is that func UpgradeContainerImageStreamDefault(st *State) error {

is not properly filtering *what* settings get touched. It is adding "container-image-stream" to *every* config, not just the model config.
And because we are
a) doing raw database access
b) rewriting all docs

we end up reading and writing out every document. What we should do:
1) Only include the model settings document. Not all of the application documents.
2) Properly decode and encode the contents of documents that we do touch.

I'm not sure how we are decoding as the code doesn't have any decode steps. So I would have expected it to pass through those keys untouched. Regardless, should be easy enough to fix.

Revision history for this message
John A Meinel (jameinel) wrote :

The one question is whether we need yet another upgrade step that will strip 'container-image-stream' from all the other settings docs that aren't model docs.

I don't know of any charms that have it as part of their existing config.

Revision history for this message
John A Meinel (jameinel) wrote :
Changed in juju:
milestone: 2.3.7 → 2.4-beta2
Revision history for this message
John A Meinel (jameinel) wrote :

Note that if you've upgraded to 2.3.6 and are experiencing this bug it is
hard to get to a fixed version. Because the upgrade step itself is broken
it won't go past it to let it notice a new version to install.

It should be possible with some manual surgery to get rolled back to 2.3.5
or to get up to a 2.3.7 but the standard "juju upgrade-juju" won't work.

John
=:->

On Sun, Apr 22, 2018, 18:30 John A Meinel <email address hidden> wrote:

> https://github.com/juju/juju/pull/8638
>
> ** Also affects: juju/2.3
> Importance: Undecided
> Status: New
>
> ** Changed in: juju/2.3
> Importance: Undecided => Critical
>
> ** Changed in: juju/2.3
> Status: New => In Progress
>
> ** Changed in: juju/2.3
> Milestone: None => 2.3.7
>
> ** Changed in: juju/2.3
> Assignee: (unassigned) => John A Meinel (jameinel)
>
> ** Changed in: juju
> Milestone: 2.3.7 => 2.4-beta2
>
> --
> You received this bug notification because you are a bug assignee.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1765722
>
> Title:
> upgrade to 2.3.6 failed: the dotted field is not valid for storage
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1765722/+subscriptions
>

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.6 KiB)

At the moment, if you try to upgrade to 2.3.6 you'll end up with a Transaction that cannot be applied, which makes it very hard to get out of this situation. (transactions don't naturally go to aborted if they can't be applied, only if the assertions fail, because they don't actually have enough information to implement rollback)

These are the steps I did to reproduce and recover:

$ juju bootstrap lxd --debug --agent-version=2.3.5
$ juju upgrade-juju --build-agent #using a source tree at juju-2.3.6
$ juju status
# shows that machine 0 is in an error state
# reading /var/log/juju/machine-0.log shows
2018-04-23 05:33:28 INFO juju.upgrade upgrade.go:138 running upgrade step: ensure container-image-stream config defaults to released
2018-04-23 05:33:29 ERROR juju.upgrade upgrade.go:140 upgrade step "ensure container-image-stream config defaults to released" failed: The dotted field 'zfs.pool_name' in 'settings.zfs.pool_name' is not valid for storage.
2018-04-23 05:33:29 ERROR juju.worker.upgradesteps worker.go:379 upgrade from 2.3.5 to 2.3.6.1 for "machine-0" failed (will retry): ensure container-image-stream config defaults to released: The dotted field 'zfs.pool_name' in 'settings.zfs.pool_name' is not valid for storage.

# get mgopurge and juju-force-upgrade onto the system
juju scp -m controller mgopurge juju-force-upgrade 0:.
# stop the agent
$ juju ssh -m controller 0
$$ sudo su -
$$ systemctl stop jujud-machine-0
# get the right credentials to connect to mongo
$$ agent=$(cd /var/lib/juju/agents; echo machine-*)
$$ pw=$(sudo grep statepassword /var/lib/juju/agents/${agent}/agent.conf | cut '-d ' -sf2)
$$ /usr/lib/juju/mongo3.2/bin/mongo --ssl -u ${agent} -p $pw --authenticationDatabase admin --sslAllowInvalidHostnames --sslAllowInvalidCertificates localhost:37017/juju

# state 2 is prepared, state 4 is 'applying' which is the one we're currently broken on
> db.txns.find({"o.c": "settings", "s": {"$in": [2, 4]}}, {"_id": 1}).pretty()
{ "_id" : ObjectId("5add702921919b059b1a2c36") }
{ "_id" : ObjectId("5add70a121919b059b1a2c3c") }
{ "_id" : ObjectId("5add711921919b059b1a2c42") }
{ "_id" : ObjectId("5add719121919b059b1a2c48") }
{ "_id" : ObjectId("5add720b21919b06bc12210b") }
{ "_id" : ObjectId("5add728321919b06bc122111") }

# forcibly remove these transactions
> db.txns.remove({"o.c": "settings", "s": {"$in": [2, 4]}})
WriteResult({ "nRemoved" : 6 })

# fix up the transactions in the db using mgopurge
$$ ~ubuntu/mgopurge -username ${agent} -password ${pw} -ssl

# find the controller UUID. Could also have been "juju models --uuid"
> db.models.find({"name": "controller"}, {"_id": 1})
{ "_id" : "bcc296d8-d6e5-427c-8339-d5560848e6f5" }

# Remove the "upgrade in progress" document
> db.upgradeInfo.remove({"_id": "current"})

# update the desired agent version
$$ ~ubuntu/juju-force-upgrade "bcc296d8-d6e5-427c-8339-d5560848e6f5" 2.3.5

# restart the controller
$$ systemctl start jujud-machine-0

# check that we're properly back to 2.3.5:
$ juju status
Model Controller Cloud/Region Version Notes SLA
controller lxd lxd 2.3.5 upgrade available: 2.3.6 unsupported

$$ grep "running jujud" /var/log/...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

https://github.com/juju/juju/pull/8642 merges 2.3 into develop along with an updated fix for the settingsMap serialization.

Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Doug Parrish (dparrish) wrote :

@jameinel - the recovery procedure in comment #11 was exercised and found to be successful with the addition of resetting the machine-0 symlink to the correct tools in /var/lib/juju/tools. Also, updated settings collection directly to update settings.agent-version.

juju:PRIMARY> db.models.find({"name": "controller"},{"_id": 1})
{ "_id" : "2699eb14-5f87-4821-8815-fe026c24b56b" }

juju:PRIMARY> db.settings.find({"_id": "2699eb14-5f87-4821-8815-fe026c24b56b:e"},{"settings.agent-version": 1}).pretty()
{
 "_id" : "2699eb14-5f87-4821-8815-fe026c24b56b:e",
 "settings" : {
  "agent-version" : "2.3.6"
 }
}

juju:PRIMARY> db.settings.update({"_id": "2699eb14-5f87-4821-8815-fe026c24b56b:e"},{$set: {"settings.agent-version": "2.3.5"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

juju:PRIMARY> db.settings.find({"_id": "<controller_model_uuid>:e"},{"settings.agent-version": 1}).pretty()
{
 "_id" : "2699eb14-5f87-4821-8815-fe026c24b56b:e",
 "settings" : {
  "agent-version" : "2.3.5"
 }
}

KB article[0] documenting this procedure is currently under review.

Thank you for quick resolution.

[0] https://support.canonical.com/ua/s/article/Recover-Juju-controller-from-failed-2-3-6-upgrade-juju

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Marking as Fix Released as 2.4.0 is out.

Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.