juju-core

Bootstrap node occasionally panicing with "not a valid unit name"

Bug #1437266 reported by William Grant on 2015-03-27

This bug affects 5 people

Affects		Status	Importance	Assigned to	Milestone
	juju-core	Fix Released	High	Frank Mueller	juju-core 1.25-alpha1
	1.24	Fix Released	High	Frank Mueller	juju-core 1.24-beta3

Bug Description

Since upgrading from 1.22.something to 1.23-beta1 on Tuesday, my vivid local provider bootstrap node's jujud is panicing several times a day. It usually happens on destroy-service, but I've seen it on a deploy as well. It is not reliably reproducible.

The tail of the log after a destroy-service panic:

2015-03-26 22:30:49 WARNING juju.lease lease.go:301 A notification timed out after 1m0s.
2015-03-27 00:25:28 ERROR juju.apiserver debuglog.go:110 debug-log handler error: write tcp 127.0.0.1:56076: broken pipe
2015-03-27 01:01:36 ERROR juju.rpc server.go:573 error writing response: write tcp 10.0.3.153:59431: broken pipe
panic: cannot retrieve unit "m#15#n#juju-public": "m#15#n#juju-public" is not a valid unit name

Tags:

Revision history for this message

Dimiter Naydenov (dimitern) wrote on 2015-03-27:

Can you paste some logs (preferably at TRACE level) and explain what commands you've run?

Revision history for this message

Dimiter Naydenov (dimitern) wrote on 2015-03-27:

At first glance this looks like related to a recent change in the megawatcher (backingOpenedPorts.remove method) - cleaning up opened ports for units when the unit goes away. What units have you deployed?

Revision history for this message

William Grant (wgrant) wrote on 2015-03-27:

I'll get logs next time it falls over. I've been deploying, redeploying, upgrading, relating, destroying, unrelating (and everything else you can think of) a selection of apache2, haproxy, gunicorn, nrpe, storage, and a couple of private charms. On one particular occasion it panicked right as I destroy-service'd a live instance of lp:~canonical-launchpad-branches/charms/trusty/turnipcake/devel

Curtis Hovey (sinzui) on 2015-03-27

Changed in juju-core:
status:	New → Triaged
importance:	Undecided → Critical
importance:	Critical → Medium
tags:	added: deploy destroy-service

Curtis Hovey (sinzui) on 2015-04-22

tags:

added: destroy-machine

Revision history for this message

William Grant (wgrant) wrote on 2015-04-30:

trace of the panic Edit (2.1 MiB, text/plain)

Reproduced with trace logging. I "juju destroy-service"'d all of the services in the environment, and watched "juju status" until it started hanging.

Revision history for this message

William Grant (wgrant) wrote on 2015-04-30:

machine-0: panic: cannot retrieve unit "m#3#n#juju-public": "m#3#n#juju-public" is not a valid unit name
machine-0: goroutine 1464 [running]:
machine-0: runtime.panic(0x131f0c0, 0xc20b9e04f0)
machine-0: #011/usr/lib/go/src/pkg/runtime/panic.c:279 +0xf5
machine-0: github.com/juju/juju/state.(*backingOpenedPorts).removed(0xc20bf0c180, 0xc20814ae80, 0xc2085e1840, 0x10bd720, 0xc20b73d160)
machine-0: #011/build/buildd/juju-core-1.23-beta4/src/github.com/juju/juju/state/megawatcher.go:585 +0x1e6
machine-0: github.com/juju/juju/state.(*allWatcherStateBacking).Changed(0xc20890b410, 0xc2085e1840, 0xc20b73d130, 0xb, 0x10bd720, 0xc20b73d160, 0xffffffffffffffff, 0x0, 0x0)
machine-0: #011/build/buildd/juju-core-1.23-beta4/src/github.com/juju/juju/state/megawatcher.go:815 +0x46a
machine-0: github.com/juju/juju/state.(*storeManager).loop(0xc208728190, 0x0, 0x0)
machine-0: #011/build/buildd/juju-core-1.23-beta4/src/github.com/juju/juju/state/multiwatcher.go:189 +0x2d5
machine-0: github.com/juju/juju/state.func·028()
machine-0: #011/build/buildd/juju-core-1.23-beta4/src/github.com/juju/juju/state/multiwatcher.go:158 +0x65
machine-0: created by github.com/juju/juju/state.newStoreManager
machine-0: #011/build/buildd/juju-core-1.23-beta4/src/github.com/juju/juju/state/multiwatcher.go:167 +0x80

Revision history for this message

Stuart Bishop (stub) wrote on 2015-05-01:

panic.log Edit (362.6 KiB, text/plain)

The test suite in lp:~stub/charms/postgresql/enable-integration-tests seems to be reliably triggering this with 1.23 release and the local provider. 'make integration_test_93' with a bootstrapped environment. all-machines.log attached.

Revision history for this message

Stuart Bishop (stub) wrote on 2015-05-01:

I can repeat this using:

juju bootstrap
juju deploy cs:postgresql
juju deploy cs:postgresql-psql psql
juju add-relation postgresql:db psql:db
juju wait
juju-deployer -T

I haven't reproduced this using 'juju destroy-service'.

Martin Packman (gz) on 2015-05-11

Changed in juju-core:
importance:	Medium → High

Curtis Hovey (sinzui) on 2015-05-11

Changed in juju-core:
milestone:	none → 1.24.0

Revision history for this message

Ian Booth (wallyworld) wrote on 2015-05-12:

Frank, the recent work to add the removed() method to backingOpenedPorts does not properly process the incoming id. See the updated() method and how it calls backingEntityIdForOpenedPortsKey() for how to do it.

Changed in juju-core:
assignee:	nobody → Frank Mueller (themue)
milestone:	1.24.0 → 1.24-beta2

Frank Mueller (themue) on 2015-05-12

Changed in juju-core:
status:	Triaged → In Progress

Curtis Hovey (sinzui) on 2015-05-12

Changed in juju-core:
milestone:	1.24-beta2 → 1.24-beta3

Ian Booth (wallyworld) on 2015-05-12

Changed in juju-core:
milestone:	1.24-beta3 → 1.25.0

Revision history for this message

Antonio Rosales (arosales) wrote on 2015-05-13:

Note, this bug is also affecting Charm CI as reported in https://bugs.launchpad.net/juju-core/+bug/1454359. If at all possible suggest this be targeted at 1.24.

-thanks,
Antonio

Frank Mueller (themue) on 2015-05-14

Changed in juju-core:
status:	In Progress → Fix Committed

Curtis Hovey (sinzui) on 2015-05-20

Changed in juju-core:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.