Deployed one of our functional test bundles https://paste.ubuntu.com/p/xtBZnTp8PW/ on a fresh 2.9.8 controller and ran into the situation where unit agents are reported as "allocating" per the controller state, whereas local `juju_unit_status` invocations show unit in the "running" state. At the same time, there are no unit agent processes running (the machine agent is fine).
Dependency engine report: https://paste.ubuntu.com/p/9KZtHsXHjS/
Goroutines: https://paste.ubuntu.com/p/M9w4tRMZ9n/
Full Juju status: https://paste.ubuntu.com/p/mDnxYhcKgf/ (note that some units are fine).
mongodump: https://private-fileshare.canonical.com/~dima/juju-dumps/2021-07-13-mongodump-1626178792
I was able to reproduce the same issue twice (did a controller redeployment just in case).
ubuntu@dmitriis-bastion:~$ juju status
Model Controller Cloud/Region Version SLA Timestamp
zaza-634b66967daa dmitriis-serverstack serverstack/serverstack 2.9.8 unsupported 11:47:11Z
# ...
ceph-osd/0* waiting allocating 8 10.5.1.251 agent initializing
ceph-osd/1 waiting allocating 9 10.5.3.223 agent initializing
ceph-osd/2 waiting allocating 10 10.5.2.5 agent initializing
# ...
Machine State DNS Inst id Series AZ Message
# ...
8 started 10.5.1.251 41acc793-3cf9-4148-a9eb-a6f22dfaef0f focal nova ACTIVE
9 started 10.5.3.223 b8cab859-23c3-4ae0-b9b9-d41cc2913f02 focal nova ACTIVE
10 started 10.5.2.5 1fe227cd-2cf7-4236-a439-1e8327c6a7a2 focal nova ACTIVE
# ...
ubuntu@juju-9b611f-zaza-634b66967daa-8:/var/log$ systemctl list-unit-files | grep juju
jujud-machine-8.service enabled enabled
ubuntu@juju-9b611f-zaza-634b66967daa-8:/var/log$ pgrep -af juju
2178 bash /etc/systemd/system/jujud-machine-8-exec-start.sh
2183 /var/lib/juju/tools/machine-8/jujud machine --data-dir /var/lib/juju --machine-id 8 --debug
ubuntu@juju-9b611f-zaza-634b66967daa-8:/var/log$ grep -RiP ceph-osd
juju/machine-8.log:2021-07-13 11:20:01 INFO juju.worker.deployer deployer.go:161 checking unit "ceph-osd/0"
juju/machine-8.log:2021-07-13 11:20:01 INFO juju.worker.deployer deployer.go:204 deploying unit "ceph-osd/0"
juju/machine-8.log:2021-07-13 11:20:02 INFO juju.worker.deployer unit_agent.go:121 creating new agent config for "ceph-osd/0"
juju/machine-8.log:2021-07-13 11:20:02 INFO juju.worker.deployer nested.go:354 starting workers for "ceph-osd/0"
juju/machine-8.log:2021-07-13 11:20:02 DEBUG juju.worker.leadership tracker.go:125 ceph-osd/0 making initial claim for ceph-osd leadership
juju/machine-8.log:2021-07-13 11:20:02 INFO juju.worker.leadership tracker.go:194 ceph-osd/0 promoted to leadership of ceph-osd
juju/machine-8.log:2021-07-13 11:20:02 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-ceph-osd-0
juju/machine-8.log:2021-07-13 11:20:02 INFO juju.downloader download.go:111 downloading from cs:~openstack-charmers-next/ceph-osd-494
juju/machine-8.log:2021-07-13 11:20:02 INFO juju.downloader download.go:94 download complete ("cs:~openstack-charmers-next/ceph-osd-494")
juju/machine-8.log:2021-07-13 11:20:02 INFO juju.downloader download.go:174 download verified ("cs:~openstack-charmers-next/ceph-osd-494")
juju/unit-ceph-osd-0.log:2021-07-13 11:20:02 INFO juju unit_agent.go:253 Starting unit workers for "ceph-osd/0"
juju/unit-ceph-osd-0.log:2021-07-13 11:20:02 INFO juju.worker.apicaller connect.go:158 [a06a92] "unit-ceph-osd-0" successfully connected to "10.5.1.74:17070"
juju/unit-ceph-osd-0.log:2021-07-13 11:20:02 INFO juju.worker.apicaller connect.go:255 [a06a92] password changed for "unit-ceph-osd-0"
juju/unit-ceph-osd-0.log:2021-07-13 11:20:02 INFO juju.worker.apicaller connect.go:158 [a06a92] "unit-ceph-osd-0" successfully connected to "10.5.1.74:17070"
juju/unit-ceph-osd-0.log:2021-07-13 11:20:02 INFO juju.worker.uniter uniter.go:328 unit "ceph-osd/0" started
juju/unit-ceph-osd-0.log:2021-07-13 11:20:02 INFO juju.worker.uniter.charm bundles.go:79 downloading cs:~openstack-charmers-next/ceph-osd-494 from API server
juju/machine-lock.log:2021-07-13 11:20:02 === agent unit-ceph-osd-0 started ===
juju/machine-lock.log:2021-07-13 11:20:02 unit-ceph-osd-0: meterstatus (meter-status-changed), waited 0s, held 0s
ubuntu@juju-9b611f-zaza-634b66967daa-8:/var/log$ ls -al /var/lib/juju/agents/
total 16
drwxr-xr-x 4 root root 4096 Jul 13 11:20 .
drwxr-xr-x 6 root root 4096 Jul 13 11:20 ..
drwxr-xr-x 2 root root 4096 Jul 13 11:20 machine-8
drwxr-xr-x 4 root root 4096 Jul 13 11:20 unit-ceph-osd-0
ubuntu@juju-9b611f-zaza-634b66967daa-8:/var/log$ sudo ss -apx | grep juju
u_str LISTEN 0 4096 @jujud-machine-8 27612 * 0 users:(("jujud",pid=2183,fd=7))
u_str LISTEN 0 4096 /var/lib/juju/agents/unit-ceph-osd-0/914117156/s 28290 * 0 users:(("jujud",pid=2183,fd=14))
u_str LISTEN 0 4096 /var/lib/juju/agents/unit-ceph-osd-0/805790685/s 28365 * 0 users:(("jujud",pid=2183,fd=12))
ubuntu@juju-9b611f-zaza-634b66967daa-8:/var/log$ ps 2183
PID TTY STAT TIME COMMAND
2183 ? SLl 0:00 /var/lib/juju/tools/machine-8/jujud machine --data-dir /var/lib/juju --machine-id 8 --debug
ubuntu@juju-9b611f-zaza-634b66967daa-8:/var/log$ sudo lsof -p 2183
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
jujud 2183 root cwd DIR 252,1 4096 2 /
jujud 2183 root rtd DIR 252,1 4096 2 /
jujud 2183 root txt REG 252,1 155099136 258070 /var/lib/juju/tools/2.9.8-ubuntu-amd64/jujud
jujud 2183 root 0r CHR 1,3 0t0 6 /dev/null
jujud 2183 root 1w REG 252,1 16374 258201 /var/log/juju/machine-8.log
jujud 2183 root 2w REG 252,1 16374 258201 /var/log/juju/machine-8.log
jujud 2183 root 3u a_inode 0,14 0 10376 [eventpoll]
jujud 2183 root 4r FIFO 0,13 0t0 27489 pipe
jujud 2183 root 5w FIFO 0,13 0t0 27489 pipe
jujud 2183 root 6w REG 252,1 16374 258201 /var/log/juju/machine-8.log
jujud 2183 root 7u unix 0xffff8f9a391e2c00 0t0 27612 @jujud-machine-8 type=STREAM
jujud 2183 root 8u IPv4 27658 0t0 TCP juju-9b611f-zaza-634b66967daa-8.project.serverstack:44964->juju-05185a-controller-0.project.serverstack:17070 (ESTABLISHED)
jujud 2183 root 9u IPv4 27657 0t0 TCP juju-9b611f-zaza-634b66967daa-8.project.serverstack:44962->juju-05185a-controller-0.project.serverstack:17070 (ESTABLISHED)
jujud 2183 root 10w REG 252,1 1217 258199 /var/log/juju/unit-ceph-osd-0.log
jujud 2183 root 11u IPv4 28276 0t0 TCP juju-9b611f-zaza-634b66967daa-8.project.serverstack:44968->juju-05185a-controller-0.project.serverstack:17070 (ESTABLISHED)
jujud 2183 root 12u unix 0xffff8f9a391e2400 0t0 28365 /var/lib/juju/agents/unit-ceph-osd-0/805790685/s type=STREAM
jujud 2183 root 13u IPv4 28289 0t0 TCP juju-9b611f-zaza-634b66967daa-8.project.serverstack:44970->juju-05185a-controller-0.project.serverstack:17070 (ESTABLISHED)
jujud 2183 root 14u unix 0xffff8f9a391e3000 0t0 28290 /var/lib/juju/agents/unit-ceph-osd-0/914117156/s type=STREAM
ubuntu@juju-9b611f-zaza-634b66967daa-8:/var/log$ juju_unit_status
agent: machine-8
units:
ceph-osd/0: running
From what I can see, unit agents are no longer started as separate processes and are instead run within the machine agent.
After enabling the TRACE log level, it looks like the resolver is blocked on waiting for storage.
juju model-config logging- config= '<root> =TRACE; unit=TRACE; juju.state= TRACE;juju. worker. uniter. resolver= TRACE;'
root@juju- 9b611f- zaza-634b66967d aa-8:~# systemctl restart jujud-machine- 8.service
https:/ /paste. ubuntu. com/p/gb7FTXQ4P x/
2021-07-13 13:12:30 INFO juju.worker.uniter uniter.go:328 unit "ceph-osd/0" started uniter. remotestate watcher.go:511 got application change for ceph-osd/0 uniter. remotestate watcher.go:584 got address change for ceph-osd/0: ok=true, hashes= [20c16e5e8646ab bbefa7f5cf61702 bc22fb50fec602e e7b2992a3a1a493 4b0e9] uniter. remotestate watcher.go:613 got relations change for ceph-osd/0: ok=true uniter. remotestate watcher.go:501 got unit change for ceph-osd/0 uniter. remotestate watcher.go:563 got trust config change for ceph-osd/0: ok=true, hashes= [7b6f0296d4721d a093763e03dc8df 64bd05de8b3e3bd ecd8220fc10cf4b 1171d] uniter. remotestate watcher.go:595 got leader settings change for ceph-osd/0: ok=true uniter. remotestate watcher.go:633 got update status interval change for ceph-osd/0: ok=true uniter. remotestate watcher.go:623 got storage change for ceph-osd/0: [osd-devices/0] ok=true uniter. remotestate watcher.go:574 got upgrade series change uniter. remotestate watcher.go:724 no upgrade series in progress, reinitializing local upgrade series state uniter. remotestate watcher.go:605 got action change for ceph-osd/0: [] ok=true uniter. remotestate watcher.go:552 got config change for ceph-osd/0: ok=true, hashes= [94f0bb34b42ffa 8222dae077ca74c c42b96970106877 4be908d6024961b 1bdd1] uniter. resolver loop.go:190 charmdir: available=false opState: started=false stopped=false changing=false uniter. upgradeseries resolver.go:48 localState. Kind="run- hook", localState. UpgradeSeriesSt atus="not started", remoteState. UpgradeSeriesSt atus="not started" uniter. storage resolver.go:174 next hook op for storage- osd-devices- 0: {Kind:0 Life:alive Attached:false Location:} uniter. storage resolver.go:138 still pending [storage- osd-devices- 0]
2021-07-13 13:12:30 DEBUG juju.worker.uniter runlistener.go:118 juju-run listener running
2021-07-13 13:12:30 INFO juju.worker.uniter uniter.go:346 hooks are retried false
2021-07-13 13:12:30 DEBUG juju.worker.
2021-07-13 13:12:30 DEBUG juju.worker.
2021-07-13 13:12:30 DEBUG juju.worker.
2021-07-13 13:12:30 DEBUG juju.worker.
2021-07-13 13:12:30 DEBUG juju.worker.
2021-07-13 13:12:30 DEBUG juju.worker.
2021-07-13 13:12:30 DEBUG juju.worker.
2021-07-13 13:12:30 DEBUG juju.worker.
2021-07-13 13:12:30 DEBUG juju.worker.
2021-07-13 13:12:30 DEBUG juju.worker.
2021-07-13 13:12:30 DEBUG juju.worker.
2021-07-13 13:12:30 DEBUG juju.worker.
2021-07-13 13:12:30 TRACE juju.worker.
2021-07-13 13:12:30 TRACE juju.worker.
2021-07-13 13:12:30 DEBUG juju.worker.
2021-07-13 13:12:30 DEBUG juju.worker.