Machine agent never connects

Bug #1867037 reported by Casey Marshall
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
New
Undecided
Unassigned

Bug Description

I tried deploying a couple of charms to the same machine in prodstack (wanted to colocate them to save resources and tear down after as it's an experiment) and the machine agent fails to start.

status is stuck at:

Every 2.0s: juju status test-kafkaconsumer-sync Wed Mar 11 18:18:22 2020

Model Controller Cloud/Region Version SLA Timestamp
prod-event-bus-ua prodstack-is prodstack-45/bootstack-ps45 2.6.10 unsupported 18:18:22Z

App Version Status Scale Charm Store Rev OS Notes
test-kafkaconsumer-kpidb waiting 0 postgresql jujucharms 203 ubuntu
test-kafkaconsumer-sync waiting 0/1 eventbus-sync jujucharms 7 ubuntu

Unit Workload Agent Machine Public address Ports Message
test-kafkaconsumer-sync/0 waiting allocating 26 10.15.122.31 waiting for machine

Machine State DNS Inst id Series AZ Message
26 down 10.15.122.31 02d2f69d-f8cc-4b11-8b6d-8d269df5008c bionic prodstack-zone-1 ACTIVE

machine long on the unit shows:

2020-03-11 18:13:01 INFO juju.cmd supercommand.go:57 running jujud [2.6.10 gc go1.11.13]
2020-03-11 18:13:01 DEBUG juju.cmd supercommand.go:58 args: []string{"/var/lib/juju/tools/machine-26/jujud", "machine", "--data-dir", "/var/lib/juju", "--machine-id", "26", "--debug"}
2020-03-11 18:13:01 DEBUG juju.utils gomaxprocs.go:24 setting GOMAXPROCS to 1
2020-03-11 18:13:01 DEBUG juju.agent agent.go:545 read agent config, format "2.0"
2020-03-11 18:13:01 INFO juju.worker.upgradesteps worker.go:73 upgrade steps for 2.6.10 have already been run.
2020-03-11 18:13:01 DEBUG juju.worker.dependency engine.go:565 "agent" manifold worker started at 2020-03-11 18:13:01.423914109 +0000 UTC
2020-03-11 18:13:01 DEBUG juju.worker.dependency engine.go:565 "clock" manifold worker started at 2020-03-11 18:13:01.424021588 +0000 UTC
2020-03-11 18:13:01 DEBUG juju.worker.dependency engine.go:565 "termination-signal-handler" manifold worker started at 2020-03-11 18:13:01.428908494 +0000 UTC
2020-03-11 18:13:01 DEBUG juju.worker.dependency engine.go:565 "upgrade-steps-gate" manifold worker started at 2020-03-11 18:13:01.428959269 +0000 UTC
2020-03-11 18:13:01 DEBUG juju.worker.dependency engine.go:565 "upgrade-check-gate" manifold worker started at 2020-03-11 18:13:01.429814446 +0000 UTC
2020-03-11 18:13:01 DEBUG juju.worker.introspection socket.go:97 introspection worker listening on "@jujud-machine-26"
2020-03-11 18:13:01 DEBUG juju.worker.introspection socket.go:127 stats worker now serving
2020-03-11 18:13:01 DEBUG juju.worker.apicaller connect.go:128 connecting with old password
2020-03-11 18:13:01 DEBUG juju.worker.dependency engine.go:565 "upgrade-check-flag" manifold worker started at 2020-03-11 18:13:01.455409165 +0000 UTC
2020-03-11 18:13:01 DEBUG juju.worker.dependency engine.go:565 "upgrade-steps-flag" manifold worker started at 2020-03-11 18:13:01.455592313 +0000 UTC
2020-03-11 18:13:01 DEBUG juju.worker.dependency engine.go:565 "state-config-watcher" manifold worker started at 2020-03-11 18:13:01.462832887 +0000 UTC
2020-03-11 18:13:01 DEBUG juju.worker.dependency engine.go:565 "api-config-watcher" manifold worker started at 2020-03-11 18:13:01.462953916 +0000 UTC
2020-03-11 18:13:01 DEBUG juju.api apiclient.go:1092 successfully dialed "wss://10.25.2.109:17070/model/6d00dc5a-d8e0-4587-841b-5192686650f3/api"
2020-03-11 18:13:01 INFO juju.api apiclient.go:624 connection established to "wss://10.25.2.109:17070/model/6d00dc5a-d8e0-4587-841b-5192686650f3/api"
2020-03-11 18:13:02 DEBUG juju.worker.apicaller connect.go:155 [6d00dc] failed to connect
2020-03-11 18:13:02 ERROR juju.worker.apicaller connect.go:204 Failed to connect to controller: invalid entity name or password (unauthorized access)
2020-03-11 18:13:02 DEBUG juju.worker.dependency engine.go:599 "api-caller" manifold worker stopped: [6d00dc] "machine-26" cannot open api: connection permanently impossible
2020-03-11 18:13:02 DEBUG juju.worker.dependency engine.go:585 "termination-signal-handler" manifold worker completed successfully
2020-03-11 18:13:02 DEBUG juju.worker.dependency engine.go:585 "clock" manifold worker completed successfully
2020-03-11 18:13:02 DEBUG juju.worker.dependency engine.go:585 "api-config-watcher" manifold worker completed successfully
2020-03-11 18:13:02 DEBUG juju.worker.dependency engine.go:585 "agent" manifold worker completed successfully
2020-03-11 18:13:02 DEBUG juju.worker.dependency engine.go:585 "upgrade-steps-flag" manifold worker completed successfully
2020-03-11 18:13:02 DEBUG juju.worker.dependency engine.go:585 "upgrade-check-flag" manifold worker completed successfully
2020-03-11 18:13:02 INFO juju.worker.stateconfigwatcher manifold.go:119 tomb dying
2020-03-11 18:13:02 DEBUG juju.worker.dependency engine.go:585 "state-config-watcher" manifold worker completed successfully
2020-03-11 18:13:02 DEBUG juju.worker.dependency engine.go:585 "upgrade-steps-gate" manifold worker completed successfully
2020-03-11 18:13:02 DEBUG juju.worker.dependency engine.go:585 "upgrade-check-gate" manifold worker completed successfully
2020-03-11 18:13:02 INFO cmd supercommand.go:502 command finished
2020-03-11 18:13:02 DEBUG juju.cmd.jujud main.go:200 jujud complete, code 0, err <nil>

Revision history for this message
Richard Harding (rharding) wrote :

This looks like a dupe of 1853080. Marking it so but let me know if you disagree. Hopefully getting to 2.7.2 or later will address it as expected.

Revision history for this message
Casey Marshall (cmars) wrote :

I think it might be possible to reproduce this by deploying two units co-located on the same machine at roughly the same time -- before the machine is provisioned.

I force-removed the stuck machine above and added a unit first to one of the applications, then another unit on the same machine after the unit agent provisioned, and it seems to work. So I think this is maybe a race condition.

Revision history for this message
Tim Penhey (thumper) wrote : Re: [Bug 1867037] Re: Machine agent never connects

It is a race between two running workers both trying to provision machines.

On Thu, Mar 12, 2020 at 7:40 AM Casey Marshall <email address hidden>
wrote:

> *** This bug is a duplicate of bug 1853080 ***
> https://bugs.launchpad.net/bugs/1853080
>
> I think it might be possible to reproduce this by deploying two units
> co-located on the same machine at roughly the same time -- before the
> machine is provisioned.
>
> I force-removed the stuck machine above and added a unit first to one of
> the applications, then another unit on the same machine after the unit
> agent provisioned, and it seems to work. So I think this is maybe a race
> condition.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: Juju bugs
> https://bugs.launchpad.net/bugs/1867037
>
> Title:
> Machine agent never connects
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1867037/+subscriptions
>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.