deploying with default binding prevents upgrade-charm

Bug #1671428 reported by Sandor Zeestraten
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
John A Meinel
2.1
Fix Released
Critical
John A Meinel

Bug Description

# Version
Juju 2.1.1
MAAS 2.1.3

# Issue
Deployed a bundle of percona-cluster in HA with hacluster charms (see bundle.yaml).

Upgrading percona-cluster charm to a new revision resulted in a error message which I do not understand.

The charms are now stuck in a permanent upgrade state (see juju status output).

# Error message during upgrade
zeestrat $ juju upgrade-charm mysql
Added charm "cs:percona-cluster-250" to the model.
ERROR cannot upgrade application "mysql" to charm "cs:percona-cluster-250": The update path 'bindings.' contains an empty field name, which is not allowed.

# Juju spaces
Space Subnets
admin 10.42.2.0/23
os-data 172.16.100.0/23
os-internal 172.16.102.0/23
ceph-public 172.16.104.0/23

Revision history for this message
Sandor Zeestraten (szeestraten) wrote :
Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

Output from juju status --format yaml

Revision history for this message
Anastasia (anastasia-macmood) wrote :

May have the same underlying cause as bug # 1671489...

Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

@anastasia-macmood Maybe. Is there a way to confirm/deny?

Juju does not seem to display any errors during the deployment of revision 249 of the percona-cluster charm. It only crops up when trying to upgrade to revision 250.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

I could not find "bindings." or anything related in 250's metadata.yaml. Could you attach the one for 249? I am trying to determine where this is coming from...

Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

I'm not sure what you are asking for. The metadata for revision 249?
Is it not possible to do "charm pull percona-cluster-249" or am I missing something?

Revision history for this message
John A Meinel (jameinel) wrote :

I think this is a separate bug from #1671489 but caused by similar issues. I believe that error is actually from MongoDB and it happens that our representation of objects in the database is different during upgrade (which would be bad, but is still possible).

Revision history for this message
John A Meinel (jameinel) wrote :

'bindings.' is likely to be coming from something to do with the bundle ('bindings' is in the bundle, but not in the charm.) 'bindings.' would probably map to the bundle lines:
    bindings:
      "": admin

from the bundle.yaml file.

I'm trying to reproduce now, though I am running into bug #1671489 while trying to test this bug.

Revision history for this message
John A Meinel (jameinel) wrote :

I believe I can reproduce it with this bundle:
$ cat >bundle.yaml <<EOF
series: xenial
services:
  ul:
    charm: "cs:~jameinel/ubuntu-lite-4"
    num_units: 2
    to:
      - "0"
      - "lxd:0"
    bindings:
      "": space-0
machines:
  "0":
    series: xenial
    constraints: "arch=amd64"
EOF

$ juju deploy ./bundle.yaml
# wait for it to complete
$ juju upgrade-charm ul
Added charm "cs:~jameinel/ubuntu-lite-5" to the model.
ERROR cannot upgrade application "ul" to charm "cs:~jameinel/ubuntu-lite-5": The update path 'bindings.' contains an empty field name, which is not allowed.

Changed in juju:
status: New → Triaged
importance: Undecided → High
assignee: nobody → John A Meinel (jameinel)
summary: - Upgrading charm results in error message and permanent upgrading state
+ deploying with default binding in bundle prevents upgrade
Revision history for this message
John A Meinel (jameinel) wrote : Re: deploying with default binding in bundle prevents upgrade

Note things stay broken:
$ juju remove-application ul
ERROR cannot destroy application "ul": The update path 'bindings.' contains an empty field name, which is not allowed.

Revision history for this message
John A Meinel (jameinel) wrote :

After trying to do "juju remove-machine --force 0" to clean up, we still are broken:
2017-03-09 23:26:38 DEBUG juju.worker.dependency engine.go:500 "mgo-txn-resumer" manifold worker stopped: cannot resume transactions: The update path 'bindings.' contains an empty field name, which is not allowed.
2017-03-09 23:26:38 ERROR juju.worker.dependency engine.go:547 "mgo-txn-resumer" manifold worker returned unexpected error: cannot resume transactions: The update path 'bindings.' contains an empty field name, which is not allowed.

If the transaction cannot be applied, I would have thought it would be flagged as invalid and not continually try to re-apply it.
Anyway, I'm digging in to try and find where the invalid TXN is being generated.

Revision history for this message
John A Meinel (jameinel) wrote :

Given that this results in a system that cannot be torn down cleanly, and TXNs that wedge the TXN resumer, it is definitely critical.

Changed in juju:
importance: High → Critical
milestone: none → 2.1.2
status: Triaged → In Progress
Revision history for this message
John A Meinel (jameinel) wrote :

The database entry looks like this:
juju:PRIMARY> db.endpointbindings.find().pretty()
{
        "_id" : "6913455a-0958-4d56-8b8e-6e4d535b9a78:a#ul",
        "bindings" : {
                "ubuntu" : "space-0",
                "" : "space-0"
        },
        "txn-revno" : NumberLong(2),
        "model-uuid" : "6913455a-0958-4d56-8b8e-6e4d535b9a78",
        "txn-queue" : [
                "58c1d05c2d11fb364eb1e9df_43e47664",
                "58c1e3ab2d11fb364eb20c41_58770e45"
        ]
}

The transaction that is modifying it looks like:
                {
                        "c" : "endpointbindings",
                        "d" : "6913455a-0958-4d56-8b8e-6e4d535b9a78:a#ul",
                        "a" : {
                                "txn-revno" : NumberLong(2)
                        },
                        "u" : {
                                "$set" : {
                                        "bindings.ubuntu" : "space-0"
                                },
                                "$unset" : {
                                        "bindings." : 1
                                }
                        }
                },

I don't know why it is trying to unset the bindings "" field, nor do I know what a better syntax is for the set/unset commands. It is possible that this would be better as:
"u": {
  "$set": {
    "bindings": {
      "ubuntu": "space-0",
    }
}

Which thus wouldn't have "" as a binding, though really it should be:
"u": {
  "$set": {
    "bindings": {
      "ubuntu": "space-0",
      "": "space-0",
    }
}

John A Meinel (jameinel)
Changed in juju:
milestone: 2.1.2 → 2.2-alpha1
Revision history for this message
John A Meinel (jameinel) wrote :

https://github.com/juju/juju/pull/7085

should be a fix for it. Testing live seems to show good things.

Revision history for this message
John A Meinel (jameinel) wrote :

I did find a mongodb query to fix the transaction queue if you run into this. You can run something like "dialmgo":

And then from that shell you supposedly could run something like:

db.txns.find({"s": 4, "o.c": "endpointbindings"}
).forEach(function(t) {
  t.o.forEach(function(op) {
    if (op.c == "endpointbindings") {
        printjson(op.u);
        delete op.u;
    }
  } );
  printjson(db.txns.update({_id: t._id}, {$set: {o: t.o}}));
  printjson(t._id);
  // printjson(t.o);
})

This walks the txns queue for updates that affect 'endpointbindings' and remove the 'update' field so that they aren't trying to: "$unset": "bindings."

It does delete the attribute that we want to delete, but the final 'db.txns.update' doesn't succeed because of:

{
        "nMatched" : 0,
        "nUpserted" : 0,
        "nModified" : 0,
        "writeError" : {
                "code" : 52,
                "errmsg" : "The dollar ($) prefixed field '$ne' in 'o.0.a.life.$ne' is not valid for storage."
        }
}

So for whatever reason, via golang Juju is able to insert records that have keys that start with "$blah", but I haven't figured out how to modify those same documents via the mongo shell.

Revision history for this message
John A Meinel (jameinel) wrote :

So... I found much simpler syntax, that just wasn't working because I had originally used it wrong. This query should fix the bad txn entry:

db.txns.update({"s": 4, "o.c": "endpointbindings"}, {"$unset": {"o.$.u": 1}})

As for how to get access to mongo, you can use:
 juju ssh -m controller 0
dialmgo() {
    agent=$(cd /var/lib/juju/agents; echo machine-*)
    pw=$(sudo cat /var/lib/juju/agents/${agent}/agent.conf |grep statepassword |awk '{ print $2 }')
    /usr/lib/juju/mongo3.2/bin/mongo --ssl --sslAllowInvalidCertificates -u ${agent} -p $pw localhost:37017/juju --authenticationDatabase admin
}

dialmgo

Which should drop you into a mongo shell connected to the database that will let you run that query. Once its run, juju's transaction resumer should finally be able to kick in and finish without dying, which will move that transaction from "pending" 4 into "applied" 6 (possibly aborted 5).

Regardless, that should give you a chance to upgrade to Juju 2.1.2 which should have the fix for this (and the charm should be upgraded at that point anyway.)

Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

@jameinel Thanks for the fix. I'll try it out on a fresh deployment with Juju 2.1.2 when available.

Is it supposed to be out today or next week?

Revision history for this message
John A Meinel (jameinel) wrote : Re: deploying with default binding prevents upgrade

It needs to land and run through CI. If we can get CI to give a bless we might release this afternoon. We've talked about also doing a proposed release and running some of our extended testing on it over the weekend before we release 2.1.2.

summary: - deploying with default binding in bundle prevents upgrade
+ deploying with default binding prevents upgrade
John A Meinel (jameinel)
summary: - deploying with default binding prevents upgrade
+ deploying with default binding prevents upgrade-charm
John A Meinel (jameinel)
Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Ante Karamatić (ivoks) wrote :

With 2.1.2:

  contrail-webui:
    charm: /home/ubuntu/build/charm-contrail-webui/trusty/contrail-webui
    constraints: *oam-space-constraint
    bindings:
      "": *internal-space

ubuntu@maas:/tmp⟫ juju upgrade-charm --switch ./trusty/contrail-webui contrail-webui
Added charm "local:trusty/contrail-webui-2" to the model.
ERROR cannot upgrade application "contrail-webui" to charm "local:trusty/contrail-webui-2": The update path 'bindings.' contains an empty field name, which is not allowed.

So it deploys successfully and all bindings are correct, but then upgrade fails.

Revision history for this message
Ante Karamatić (ivoks) wrote :

Also, removing this application or any unit of this application is now impossible. Removing other applications/units works just fine. Once one tries upgrade-charm with --path or --switch, app cannot be removed.

Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

@ivoks I created #1671476 for being unable to remove the application/unit when it is in this upgrading state.

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1671428] Re: deploying with default binding prevents upgrade-charm

Right. The app is now part of an invalid transaction, so it prevents any
further changes to that application until you purge the invalid transaction
(see above for the database commands you can run to fix the txn log.)

Once upgraded to 2.1.2 you should not be able to get into this situation,
thus the workaround shouldn't be needed anymore.

John
=:->

On Thu, Mar 16, 2017 at 5:09 PM, Ante Karamatić <
<email address hidden>> wrote:

> Also, removing this application or any unit of this application is now
> impossible. Removing other applications/units works just fine. Once one
> tries upgrade-charm with --path or --switch, app cannot be removed.
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1671428
>
> Title:
> deploying with default binding prevents upgrade-charm
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1671428/+subscriptions
>

Revision history for this message
Ante Karamatić (ivoks) wrote :

False alarm, sorry. Controller still had 2.1.1. Works just fine with 2.1.2:

ubuntu@maas:~⟫ juju upgrade-charm --path ~/build/charm-contrail-webui/trusty/contrail-webui contrail-webui --force-units
Added charm "local:trusty/contrail-webui-3" to the model.

Revision history for this message
John A Meinel (jameinel) wrote :

According to the discussion in IRC, comment #19 was actually with 2.1.1 (possibly a 2.1.2 client, but it was a 2.1.1 server, and the bug is in the server code.)

He confirmed that with a fresh 2.1.2 the bug is fixed.

Curtis Hovey (sinzui)
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.