Bootstrap node occasionally panicing with "not a valid unit name"
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | juju-core |
High
|
Frank Mueller | ||
| | 1.24 |
High
|
Frank Mueller | ||
Bug Description
Since upgrading from 1.22.something to 1.23-beta1 on Tuesday, my vivid local provider bootstrap node's jujud is panicing several times a day. It usually happens on destroy-service, but I've seen it on a deploy as well. It is not reliably reproducible.
The tail of the log after a destroy-service panic:
2015-03-26 22:30:49 WARNING juju.lease lease.go:301 A notification timed out after 1m0s.
2015-03-27 00:25:28 ERROR juju.apiserver debuglog.go:110 debug-log handler error: write tcp 127.0.0.1:56076: broken pipe
2015-03-27 01:01:36 ERROR juju.rpc server.go:573 error writing response: write tcp 10.0.3.153:59431: broken pipe
panic: cannot retrieve unit "m#15#n#
| Dimiter Naydenov (dimitern) wrote : | #1 |
| Dimiter Naydenov (dimitern) wrote : | #2 |
At first glance this looks like related to a recent change in the megawatcher (backingOpenedP
| William Grant (wgrant) wrote : | #3 |
I'll get logs next time it falls over. I've been deploying, redeploying, upgrading, relating, destroying, unrelating (and everything else you can think of) a selection of apache2, haproxy, gunicorn, nrpe, storage, and a couple of private charms. On one particular occasion it panicked right as I destroy-service'd a live instance of lp:~canonical-launchpad-branches/charms/trusty/turnipcake/devel
| Changed in juju-core: | |
| status: | New → Triaged |
| importance: | Undecided → Critical |
| importance: | Critical → Medium |
| tags: | added: deploy destroy-service |
| tags: | added: destroy-machine |
| William Grant (wgrant) wrote : | #4 |
Reproduced with trace logging. I "juju destroy-service"'d all of the services in the environment, and watched "juju status" until it started hanging.
| William Grant (wgrant) wrote : | #5 |
machine-0: panic: cannot retrieve unit "m#3#n#
machine-0: goroutine 1464 [running]:
machine-0: runtime.
machine-0: #011/usr/
machine-0: github.
machine-0: #011/build/
machine-0: github.
machine-0: #011/build/
machine-0: github.
machine-0: #011/build/
machine-0: github.
machine-0: #011/build/
machine-0: created by github.
machine-0: #011/build/
| Stuart Bishop (stub) wrote : | #6 |
The test suite in lp:~stub/charms/postgresql/enable-integration-tests seems to be reliably triggering this with 1.23 release and the local provider. 'make integration_
| Stuart Bishop (stub) wrote : | #7 |
I can repeat this using:
juju bootstrap
juju deploy cs:postgresql
juju deploy cs:postgresql-psql psql
juju add-relation postgresql:db psql:db
juju wait
juju-deployer -T
I haven't reproduced this using 'juju destroy-service'.
| Changed in juju-core: | |
| importance: | Medium → High |
| Changed in juju-core: | |
| milestone: | none → 1.24.0 |
| Ian Booth (wallyworld) wrote : | #8 |
Frank, the recent work to add the removed() method to backingOpenedPorts does not properly process the incoming id. See the updated() method and how it calls backingEntityId
| Changed in juju-core: | |
| assignee: | nobody → Frank Mueller (themue) |
| milestone: | 1.24.0 → 1.24-beta2 |
| Changed in juju-core: | |
| status: | Triaged → In Progress |
| Changed in juju-core: | |
| milestone: | 1.24-beta2 → 1.24-beta3 |
| Changed in juju-core: | |
| milestone: | 1.24-beta3 → 1.25.0 |
| Antonio Rosales (arosales) wrote : | #9 |
Note, this bug is also affecting Charm CI as reported in https:/
-thanks,
Antonio
| Changed in juju-core: | |
| status: | In Progress → Fix Committed |
| Changed in juju-core: | |
| status: | Fix Committed → Fix Released |


Can you paste some logs (preferably at TRACE level) and explain what commands you've run?