`juju exec --unit` doesn't work against a manually added machine

Bug #2008526 reported by Simon Déziel
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Unassigned

Bug Description

tl;dr: If you `juju add-machine ssh:...` a machine then deploy a unit to it, you won't be able to `juju exec --unit` to it. However `juju exec --machine` works.

Long description:

I've deployed a LXD VM (essentially `lxc launch images:ubuntu/22.04 microk8s`), added openssh-server to it, injected `~/.local/share/juju/ssh/juju_id_rsa.pub` to the ubuntu@microk8s account and then `juju add-machine ssh:10.184.166.111` it. After that, I deployed the microk8s charm with `juju deploy microk8s -n1 --base jammy`.

This seemed to have worked fine according to `juju status`:

$ juju status
Model Controller Cloud/Region Version SLA Timestamp
microk8s overlord juju-lxd/default 3.1.0 unsupported 16:59:13-05:00

App Version Status Scale Charm Channel Rev Exposed Message
microk8s active 1 microk8s stable 27 no

Unit Workload Agent Machine Public address Ports Message
microk8s/0* active idle 1 10.184.166.111 80,443,16443/tcp

Machine State Address Inst id Base AZ Message
1 started 10.184.166.111 manual:10.184.166.111 ubuntu@22.04 Manually provisioned machine

However, `juju exec --unit` keeps timing out:

$ juju exec --debug --unit microk8s/0 -- hostname
17:01:21 INFO juju.cmd supercommand.go:56 running juju [3.1.0 924c9e190eb56d9a9324d9cc0dd9dd663c501ac4 gc go1.18.10]
17:01:21 DEBUG juju.cmd supercommand.go:57 args: []string{"/snap/juju/22136/bin/juju", "exec", "--debug", "--unit", "microk8s/0", "--", "hostname"}
17:01:21 INFO juju.juju api.go:86 connecting to API addresses: [10.184.166.216:17070]
17:01:21 DEBUG juju.api apiclient.go:1167 successfully dialed "wss://10.184.166.216:17070/model/4968e5e9-fc42-4c24-86e0-b7c4f8b78717/api"
17:01:21 INFO juju.api apiclient.go:702 connection established to "wss://10.184.166.216:17070/model/4968e5e9-fc42-4c24-86e0-b7c4f8b78717/api"
17:01:21 INFO cmd common.go:375 Running operation 55 with 1 task
17:01:21 INFO cmd common.go:375 - task 56 on unit-microk8s-0
17:01:21 INFO cmd common.go:375
17:01:21 INFO cmd common.go:375 Waiting for task 56...
17:06:21 DEBUG juju.api monitor.go:35 RPC connection died
ERROR timed out waiting for results from: unit microk8s/0
17:06:21 DEBUG cmd supercommand.go:548 error stack:
github.com/juju/juju/cmd/juju/action.(*runCommandBase).handleTimeout:355: timed out waiting for results from: unit microk8s/0
github.com/juju/juju/cmd/juju/action.(*runCommandBase).processOperationResults:227:

But `juju exec --machine` works OK:

$ juju exec --wait 10s --debug --machine 1 -- hostname
17:11:21 INFO juju.cmd supercommand.go:56 running juju [3.1.0 924c9e190eb56d9a9324d9cc0dd9dd663c501ac4 gc go1.18.10]
17:11:21 DEBUG juju.cmd supercommand.go:57 args: []string{"/snap/juju/22136/bin/juju", "exec", "--wait", "10s", "--debug", "--machine", "1", "--", "hostname"}
17:11:21 INFO juju.juju api.go:86 connecting to API addresses: [10.184.166.216:17070]
17:11:21 DEBUG juju.api apiclient.go:1167 successfully dialed "wss://10.184.166.216:17070/model/4968e5e9-fc42-4c24-86e0-b7c4f8b78717/api"
17:11:21 INFO juju.api apiclient.go:702 connection established to "wss://10.184.166.216:17070/model/4968e5e9-fc42-4c24-86e0-b7c4f8b78717/api"
17:11:21 INFO cmd common.go:375 Running operation 57 with 1 task
17:11:21 INFO cmd common.go:375 - task 58 on machine-1
17:11:21 INFO cmd common.go:375
17:11:21 INFO cmd common.go:375 Waiting for task 58...
microk8s
17:11:23 DEBUG juju.api monitor.go:35 RPC connection died
17:11:23 INFO cmd supercommand.go:555 command finished

There isn't much interesing in the controller logs:

$ juju debug-log -m controller --level error --replay
machine-0: 21:58:35 ERROR juju.mongo could not set the value of "/proc/sys/net/core/netdev_max_backlog" to "1000" because of: "/proc/sys/net/core/netdev_max_backlog" does not exist, will not set "1000"
machine-0: 21:58:35 ERROR juju.mongo could not set the value of "/sys/kernel/mm/transparent_hugepage/enabled" to "never" because of: open /sys/kernel/mm/transparent_hugepage/enabled: permission denied
machine-0: 21:58:35 ERROR juju.mongo could not set the value of "/sys/kernel/mm/transparent_hugepage/defrag" to "never" because of: open /sys/kernel/mm/transparent_hugepage/defrag: permission denied
machine-0: 21:58:35 ERROR juju.mongo could not set the value of "/proc/sys/net/core/somaxconn" to "16384" because of: "/proc/sys/net/core/somaxconn" does not exist, will not set "16384"
machine-0: 21:58:35 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [963919] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
machine-0: 21:58:40 ERROR juju.worker.dependency "lxd-container-provisioner" manifold worker returned unexpected error: container types not yet available
machine-0: 21:58:40 ERROR juju.worker.dependency "broker-tracker" manifold worker returned unexpected error: no container types determined
machine-0: 21:58:40 ERROR juju.worker.logger connection is shut down
machine-0: 21:58:40 ERROR juju.worker.dependency "kvm-container-provisioner" manifold worker returned unexpected error: container types not yet available
machine-0: 21:58:40 ERROR juju.worker.dependency "broker-tracker" manifold worker returned unexpected error: no container types determined
machine-0: 21:58:40 ERROR juju.worker.dependency "lxd-container-provisioner" manifold worker returned unexpected error: container types not yet available
unit-controller-0: 21:58:41 ERROR juju.worker.meterstatus error running "meter-status-changed": charm missing from disk
machine-0: 22:10:31 ERROR juju.apiserver debug-log handler error: sending failed: write tcp 10.184.166.216:17070->10.184.166.1:47090: write: broken pipe
machine-0: 22:10:32 ERROR juju.rpc error writing response: *tls.permanentError write tcp 10.184.166.216:17070->10.184.166.199:59234: write: connection reset by peer
machine-0: 22:10:32 ERROR juju.rpc error writing response: *tls.permanentError write tcp 10.184.166.216:17070->10.184.166.199:59234: write: connection reset by peer
machine-0: 22:10:32 ERROR juju.rpc error writing response: *tls.permanentError write tcp 10.184.166.216:17070->10.184.166.199:59234: write: connection reset by peer
machine-0: 22:10:32 ERROR juju.rpc error writing response: *tls.permanentError write tcp 10.184.166.216:17070->10.184.166.199:59234: write: connection reset by peer
machine-0: 22:10:32 ERROR juju.rpc error writing response: *tls.permanentError write tcp 10.184.166.216:17070->10.184.166.199:59234: write: connection reset by peer
machine-0: 22:10:32 ERROR juju.rpc error writing response: *tls.permanentError write tcp 10.184.166.216:17070->10.184.166.199:59234: write: connection reset by peer
machine-0: 22:10:32 ERROR juju.rpc error writing response: *tls.permanentError write tcp 10.184.166.216:17070->10.184.166.199:59234: write: connection reset by peer
machine-0: 22:10:32 ERROR juju.rpc error writing response: *tls.permanentError write tcp 10.184.166.216:17070->10.184.166.199:59234: write: connection reset by peer
machine-0: 22:10:32 ERROR juju.rpc error writing response: *tls.permanentError write tcp 10.184.166.216:17070->10.184.166.199:59234: write: connection reset by peer
machine-0: 15:21:46 ERROR juju.mongo could not set the value of "/sys/kernel/mm/transparent_hugepage/defrag" to "never" because of: open /sys/kernel/mm/transparent_hugepage/defrag: permission denied
machine-0: 15:21:46 ERROR juju.mongo could not set the value of "/proc/sys/net/core/somaxconn" to "16384" because of: "/proc/sys/net/core/somaxconn" does not exist, will not set "16384"
machine-0: 15:21:46 ERROR juju.mongo could not set the value of "/proc/sys/net/core/netdev_max_backlog" to "1000" because of: "/proc/sys/net/core/netdev_max_backlog" does not exist, will not set "1000"
machine-0: 15:21:46 ERROR juju.mongo could not set the value of "/sys/kernel/mm/transparent_hugepage/enabled" to "never" because of: open /sys/kernel/mm/transparent_hugepage/enabled: permission denied
machine-0: 15:21:46 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [963919] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
machine-0: 15:21:50 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [963919] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
machine-0: 15:26:02 ERROR juju.mongo could not set the value of "/sys/kernel/mm/transparent_hugepage/enabled" to "never" because of: open /sys/kernel/mm/transparent_hugepage/enabled: permission denied
machine-0: 15:26:02 ERROR juju.mongo could not set the value of "/sys/kernel/mm/transparent_hugepage/defrag" to "never" because of: open /sys/kernel/mm/transparent_hugepage/defrag: permission denied
machine-0: 15:26:02 ERROR juju.mongo could not set the value of "/proc/sys/net/core/somaxconn" to "16384" because of: "/proc/sys/net/core/somaxconn" does not exist, will not set "16384"
machine-0: 15:26:02 ERROR juju.mongo could not set the value of "/proc/sys/net/core/netdev_max_backlog" to "1000" because of: "/proc/sys/net/core/netdev_max_backlog" does not exist, will not set "1000"
machine-0: 15:26:02 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [963919] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
machine-0: 15:26:06 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [963919] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
machine-0: 15:26:10 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [963919] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
machine-0: 15:52:14 ERROR juju.worker.dependency "environ-tracker" manifold worker returned unexpected error: model "microk8s" (9426dd00-5211-4881-8dcc-70e4addb6ca8): reading model config: model "9426dd00-5211-4881-8dcc-70e4addb6ca8": settings not found (not found)
machine-0: 15:52:14 ERROR juju.worker.dependency "undertaker" manifold worker returned unexpected error: model "9426dd00-5211-4881-8dcc-70e4addb6ca8" not found (not found)
machine-0: 09:59:49 ERROR juju.mongo could not set the value of "/proc/sys/net/core/somaxconn" to "16384" because of: "/proc/sys/net/core/somaxconn" does not exist, will not set "16384"
machine-0: 09:59:49 ERROR juju.mongo could not set the value of "/proc/sys/net/core/netdev_max_backlog" to "1000" because of: "/proc/sys/net/core/netdev_max_backlog" does not exist, will not set "1000"
machine-0: 09:59:49 ERROR juju.mongo could not set the value of "/sys/kernel/mm/transparent_hugepage/enabled" to "never" because of: open /sys/kernel/mm/transparent_hugepage/enabled: permission denied
machine-0: 09:59:49 ERROR juju.mongo could not set the value of "/sys/kernel/mm/transparent_hugepage/defrag" to "never" because of: open /sys/kernel/mm/transparent_hugepage/defrag: permission denied
machine-0: 09:59:50 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [963919] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
machine-0: 09:59:53 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [963919] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
machine-0: 09:59:58 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [963919] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
machine-0: 16:15:49 ERROR juju.apiserver debug-log handler error: sending failed: write tcp 10.184.166.216:17070->10.184.166.1:53632: write: broken pipe
machine-0: 16:17:07 ERROR juju.apiserver debug-log handler error: sending failed: write tcp 10.184.166.216:17070->10.184.166.1:58112: write: broken pipe
machine-0: 16:17:07 ERROR juju.apiserver debug-log handler error: sending failed: write tcp 10.184.166.216:17070->10.184.166.1:49726: write: broken pipe
machine-0: 16:28:12 ERROR juju.apiserver debug-log handler error: sending failed: write tcp 10.184.166.216:17070->10.184.166.1:55592: write: broken pipe

Those "broken pipe" seemed related but they don't correlate with failed `juju exec` as I've run 10s of failed/timed out exec.

This has be discussed and initially triaged by @hmlanigan (thanks!) here https://chat.charmhub.io/charmhub/channels/juju/wyzbo4ezjtdiuriypeyxjf8mah

Additional information:

$ snap list juju
Name Version Rev Tracking Publisher Notes
juju 3.1.0 22136 3.1/stable canonical✓ -

Simon Déziel (sdeziel)
description: updated
tags: added: add-machine add-unit exec
Changed in juju:
importance: Undecided → High
status: New → Triaged
tags: added: ssh
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.