Juju run command reaches controller but exits with "ERROR connection is shut down"

Bug #1895690 reported by Michael Skalka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Medium
Unassigned

Bug Description

Using both Juju 2.8.1 and 2.8.2 a `juju run` command was issued against a unit. The controller's audit log shows the command being registered however the machine log does not show the run command being executed. Instead the log is full of transaction errors between the unit and what looks like the controller.

Client side log:
...
2020-09-15-00:52:46 root ERROR [localhost] Command failed: juju run -m foundations-maas:openstack -u ceph-mon/0 'ceph osd pool create scbench 100 100'
2020-09-15-00:52:46 root ERROR [localhost] STDOUT follows:
ERROR connection is shut down

2020-09-15-00:52:46 root ERROR [localhost] STDERR follows:
None
...

...
subprocess.CalledProcessError: Command '['juju', 'run', '-m', 'foundations-maas:openstack', '-u', 'ceph-mon/0', 'ceph osd pool create scbench 100 100']' returned non-zero exit status 1.

Controller log:
...
{"request":{"conversation-id":"5e5e18b88e66bf14","connection-id":"435","request-id":2,"when":"2020-09-15T00:11:58Z","facade":"Application","method":"Get","version":12}}
{"errors":{"conversation-id":"5e5e18b88e66bf14","connection-id":"435","request-id":2,"when":"2020-09-15T00:11:58Z","errors":null}}
{"conversation":{"who":"admin","what":"/snap/juju/14127/bin/juju run -m foundations-maas:openstack -u ceph-mon/0 \"ceph osd pool create scbench 100 100\"","when":"2020-09-15T00:52:29Z","model-name":"openstack","model-uuid":"14326eba-7b46-4004-8d3b-600330ec0d25","conversation-id":"79f10a3fcd0b12df","connection-id":"512"}}
{"request":{"conversation-id":"79f10a3fcd0b12df","connection-id":"512","request-id":2,"when":"2020-09-15T00:52:29Z","facade":"Action","method":"Run","version":6}}
{"errors":{"conversation-id":"79f10a3fcd0b12df","connection-id":"512","request-id":2,"when":"2020-09-15T00:52:32Z","errors":[null]}}
{"request":{"conversation-id":"79f10a3fcd0b12df","connection-id":"512","request-id":3,"when":"2020-09-15T00:52:32Z","facade":"Action","method":"Actions","version":6}}
{"errors":{"conversation-id":"79f10a3fcd0b12df","connection-id":"512","request-id":3,"when":"2020-09-15T00:52:45Z","errors":[null]}}
...

machine-3-lxd-1.log:

/build/snapcraft-juju-35d6cf/parts/juju/src/api/client.go:683:
/build/snapcraft-juju-35d6cf/parts/juju/src/api/logsender/logsender.go:75: cannot send log message
/build/snapcraft-juju-35d6cf/parts/juju/src/worker/logsender/worker.go:69:
2020-09-15 00:52:32 ERROR juju.api.watcher watcher.go:92 error trying to stop watcher: txn watcher sync error
2020-09-15 00:52:32 ERROR juju.api.watcher watcher.go:92 error trying to stop watcher: txn watcher sync error
2020-09-15 00:52:32 DEBUG juju.worker.dependency engine.go:598 "fan-configurer" manifold worker stopped: txn watcher sync error
2020-09-15 00:52:32 ERROR juju.worker.dependency engine.go:671 "fan-configurer" manifold worker returned unexpected error: txn watcher sync error
2020-09-15 00:52:32 DEBUG juju.worker.dependency engine.go:673 stack trace:
txn watcher sync error
/build/snapcraft-juju-35d6cf/parts/juju/src/rpc/client.go:178:
/build/snapcraft-juju-35d6cf/parts/juju/src/api/apiclient.go:1200:
2020-09-15 00:52:32 DEBUG juju.worker.dependency engine.go:598 "upgrader" manifold worker stopped: txn watcher sync error
2020-09-15 00:52:32 ERROR juju.worker.dependency engine.go:671 "upgrader" manifold worker returned unexpected error: txn watcher sync error
2020-09-15 00:52:32 ERROR juju.api.watcher watcher.go:92 error trying to stop watcher: txn watcher sync error
2020-09-15 00:52:32 DEBUG juju.worker.dependency engine.go:673 stack trace:
txn watcher sync error
...
...
/build/snapcraft-juju-35d6cf/parts/juju/src/rpc/client.go:178:
/build/snapcraft-juju-35d6cf/parts/juju/src/api/apiclient.go:1200:
2020-09-15 00:52:32 DEBUG juju.api monitor.go:35 RPC connection died
2020-09-15 00:52:32 ERROR juju.api.watcher watcher.go:92 error trying to stop watcher: codec.ReadHeader error: error receiving message: read tcp 10.246.65.58:37060->10.246.64.253:17070: read: connection reset by peer
2020-09-15 00:52:32 DEBUG juju.rpc server.go:328 error closing codec: write tcp 10.246.65.58:37060->10.246.64.253:17070: write: broken pipe
2020-09-15 00:52:32 ERROR juju.api.watcher watcher.go:92 error trying to stop watcher: codec.ReadHeader error: error receiving message: read tcp 10.246.65.58:37060->10.246.64.253:17070: read: connection reset by peer

Test run can be found here:
https://solutions.qa.canonical.com/openstack/testRun/0d08cb68-872e-4880-a8b3-10a54bfd99f4
Artifacts here:
https://oil-jenkins.canonical.com/artifacts/0d08cb68-872e-4880-a8b3-10a54bfd99f4/index.html

Revision history for this message
Pen Gale (pengale) wrote :

I'm not sure that we have a clean repro case here, but we do have extensive logs (thx, mskalka!)

Adding this to 2.9 milestone, as I'd like us to take a shot at squashing it at the end of this cycle.

Changed in juju:
milestone: none → 2.9-beta1
status: New → Triaged
importance: Undecided → Medium
Changed in juju:
milestone: 2.9-beta1 → 2.9-rc1
Revision history for this message
Pen Gale (pengale) wrote :

Marking as duplicate of other txn watcher sync error, as we believe that both of these have the same root cause, possibly due to a re-used connection.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.